DFT-D3 Dispersion Benchmarks#

This page presents benchmark results for DFT-D3 dispersion corrections across different GPU hardware. Results show the scaling behavior with increasing system size for periodic systems, including both single-system and batched computations.

Warning

These results are intended to be indicative only: your actual performance may vary depending on the atomic system topology, software and hardware configuration and we encourage users to benchmark on their own systems of interest.

How to Read These Charts#

Time Scaling : Median execution time (ms) vs. system size. Lower is better. Timings exclude neighbor list construction, and only comprises the DFT-D3 computation.

Throughput : Atoms processed per millisecond. Higher is better. This indicates where the scaling point where the GPU saturates.

Memory : Peak GPU memory usage (MB) vs. system size. This is particularly useful for estimating/gauging memory requirements for your system.

Performance Results#

Backend Comparison

Simple comparison of single (non-batched) system computations between backends, where we scale up the size of the supercell.

Time Scaling

DFT-D3 backend time comparison — Median execution time comparison between backends for single systems.#

Throughput

Memory Usage

DFT-D3 backend memory comparison — Peak GPU memory consumption comparison between backends. Lower is better, indicating that the backend has lower memory requirements.#

nvalchemiops

Scaling of single and batched computation with the nvalchemiops backend. Shows how performance scales with different batch sizes.

Time Scaling

Throughput

Memory Usage

torch-dftd

Scaling of single and batched computation with the torch-dftd backend. Shows how performance scales with different batch sizes.

Time Scaling

Throughput

Memory Usage

Hardware Information#

GPU: NVIDIA H100 80GB HBM3

Benchmark Configuration#

Parameter	Value
Cutoff	21.2 Å (40 Bohr)
System Type	CsCl supercells with periodic boundaries
Neighbor List	Cell list algorithm (\(O(N)\) scaling)
Warmup Iterations	3
Timing Iterations	10
Precision	`float32`

DFT-D3 Parameters#

Parameter	Value
Functional	BJ-damping
`a1`	0.4289
`a2`	4.4407
`s6`	1.0
`s8`	0.7875

Interpreting Results#

total_atoms : Total number of atoms in the supercell.

batch_size : Number of systems processed simultaneously.

supercell_size : Linear dimension of supercell (\(n^3\)).

total_neighbors : Total number of neighbor pairs within cutoff.

median_time_ms : Median execution time in milliseconds (lower is better).

peak_memory_mb : Peak GPU memory usage in megabytes.

Note

Timings exclude neighbor list construction and only measure the DFT-D3 energy/force calculation.

Running Your Own Benchmarks#

To generate benchmark results for your hardware:

`nvalchemiops` Backend (default)#

cd benchmarks/interactions/dispersion
python benchmark_dftd3.py \
    --config benchmark_config.yaml \
    --backend warp \
    --output-dir ../../../docs/benchmarks/benchmark_results

`torch-dftd` Backend#

cd benchmarks/interactions/dispersion
python benchmark_dftd3.py \
    --config benchmark_config.yaml \
    --backend torch_dftd \
    --output-dir ../../../docs/benchmarks/benchmark_results

Options#

--backend {warp,torch_dftd} : Select backend (default: warp).

--gpu-sku <name> : Override GPU SKU name for output files (default: auto-detect).

--config <path> : Path to YAML configuration file.

Results will be saved as CSV files and plots will be automatically generated during the next documentation build.

DFT-D3 Dispersion Benchmarks#

How to Read These Charts#

Performance Results#

Hardware Information#

Benchmark Configuration#

DFT-D3 Parameters#

Interpreting Results#

Running Your Own Benchmarks#

nvalchemiops Backend (default)#

torch-dftd Backend#

Options#

`nvalchemiops` Backend (default)#

`torch-dftd` Backend#