DFT-D3 Dispersion Benchmarks#

This page presents benchmark results for DFT-D3 dispersion corrections across different GPU hardware. Results show the scaling behavior with increasing system size for periodic systems, including both single-system and batched computations.

Warning

These results are intended to be indicative only: your actual performance may vary depending on the atomic system topology, software and hardware configuration and we encourage users to benchmark on their own systems of interest.

How to Read These Charts#

Time Scaling : Median execution time (ms) vs. system size. Lower is better. Timings exclude neighbor list construction, and only comprises the DFT-D3 computation.

Throughput : Atoms processed per millisecond. Higher is better. This indicates where the scaling point where the GPU saturates.

Memory : Peak GPU memory usage (MB) vs. system size. This is particularly useful for estimating/gauging memory requirements for your system.

Performance Results#

Simple comparison of single (non-batched) system computations between backends, where we scale up the size of the supercell.

Time Scaling

DFT-D3 backend time comparison

Median execution time comparison between backends for single systems.#

Throughput

DFT-D3 backend throughput comparison

Throughput (atoms/ms) comparison between backends. Higher values indicate better performance.#

Memory Usage

DFT-D3 backend memory comparison

Peak GPU memory consumption comparison between backends. Lower is better, indicating that the backend has lower memory requirements.#

Scaling of single and batched computation with the nvalchemiops backend. Shows how performance scales with different batch sizes.

Time Scaling

DFT-D3 nvalchemiops time scaling

Execution time scaling for single and batched systems.#

Throughput

DFT-D3 nvalchemiops throughput

Throughput (atoms/ms) for single and batched systems.#

Memory Usage

DFT-D3 nvalchemiops memory usage

Peak GPU memory consumption for single and batched systems.#

Scaling of single and batched computation with the torch-dftd backend. Shows how performance scales with different batch sizes.

Time Scaling

DFT-D3 torch-dftd time scaling

Execution time scaling for single and batched systems.#

Throughput

DFT-D3 torch-dftd throughput

Throughput (atoms/ms) for single and batched systems.#

Memory Usage

DFT-D3 torch-dftd memory usage

Peak GPU memory consumption for single and batched systems.#

Hardware Information#

GPU: NVIDIA H100 80GB HBM3

Benchmark Configuration#

Parameter

Value

Cutoff

21.2 Å (40 Bohr)

System Type

CsCl supercells with periodic boundaries

Neighbor List

Cell list algorithm (\(O(N)\) scaling)

Warmup Iterations

3

Timing Iterations

10

Precision

float32

DFT-D3 Parameters#

Parameter

Value

Functional

BJ-damping

a1

0.4289

a2

4.4407

s6

1.0

s8

0.7875

Interpreting Results#

total_atoms : Total number of atoms in the supercell.

batch_size : Number of systems processed simultaneously.

supercell_size : Linear dimension of supercell (\(n^3\)).

total_neighbors : Total number of neighbor pairs within cutoff.

median_time_ms : Median execution time in milliseconds (lower is better).

peak_memory_mb : Peak GPU memory usage in megabytes.

Note

Timings exclude neighbor list construction and only measure the DFT-D3 energy/force calculation.

Running Your Own Benchmarks#

To generate benchmark results for your hardware:

nvalchemiops Backend (default)#

cd benchmarks/interactions/dispersion
python benchmark_dftd3.py \
    --config benchmark_config.yaml \
    --backend warp \
    --output-dir ../../../docs/benchmarks/benchmark_results

torch-dftd Backend#

cd benchmarks/interactions/dispersion
python benchmark_dftd3.py \
    --config benchmark_config.yaml \
    --backend torch_dftd \
    --output-dir ../../../docs/benchmarks/benchmark_results

Options#

--backend {warp,torch_dftd} : Select backend (default: warp).

--gpu-sku <name> : Override GPU SKU name for output files (default: auto-detect).

--config <path> : Path to YAML configuration file.

Results will be saved as CSV files and plots will be automatically generated during the next documentation build.