DFT-D3(BJ) Dispersion Correction#

Dispersion corrections account for van der Waals interactions that standard DFT functionals underestimate. ALCHEMI Toolkit-Ops provides GPU-accelerated DFT-D3 with Becke-Johnson damping via NVIDIA Warp, supporting batched computation, periodic systems, and full torch.compile compatibility.

Tip

The current implementation computes two-body terms only (C6 and C8). Three-body Axilrod-Teller-Muto (ATM/C9) contributions are not included.

Why Dispersion Correction Matters#

Standard DFT functionals systematically underestimate long-range dispersion due to their local or semi-local nature. This leads to significant errors in:

Binding energies of weakly bound complexes
Molecular crystal structures and lattice energies
Conformational energies of flexible molecules
Adsorption energies on surfaces

DFT-D3 adds an empirical pairwise correction with environment-dependent C6 coefficients that adapt based on coordination numbers. The Becke-Johnson damping variant provides improved short-range behavior for molecular geometries.

Quick Start#

Important

DFT-D3 parameters must be explicitly provided via d3_params (a D3Parameters instance or dictionary). See Parameter Setup for details.

Unit consistency is required: Standard D3 parameters use atomic units— Bohr for lengths, Hartree for energies. Unit mismatches may cause the neighbor list calculation to run out of memory (e.g. cutoff in Bohr, but positions are in Angstroms) or yield an unconverged estimate of the dispersion corrections (i.e. cutoff in Angstrom, positions in Bohr). See Units for guidance on units for each parameter.

Neighbor Matrix (Dense)

from nvalchemiops.interactions.dispersion.dftd3 import dftd3
from nvalchemiops.neighborlist import neighbor_list

# Build neighbor matrix; 50 Bohr cutoff
neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff=50.0, cell=cell, pbc=pbc
)

# Compute dispersion correction (PBE functional)
energy, forces, coord_num = dftd3(
    positions=positions,           # [num_atoms, 3] in Bohr
    numbers=numbers,               # [num_atoms] atomic numbers
    neighbor_matrix=neighbor_matrix,
    a1=0.3981, a2=4.4211, s8=0.7875,
    d3_params=d3_params,
)

Neighbor List (Sparse COO)

from nvalchemiops.interactions.dispersion.dftd3 import dftd3
from nvalchemiops.neighborlist import neighbor_list

# Build neighbor list in COO format; 50 Bohr cutoff
neighbor_list_coo, neighbor_list_ptr, unit_shifts = neighbor_list(
    positions, cutoff=50.0, cell=cell, pbc=pbc, return_neighbor_list=True
)

# Compute dispersion correction (PBE functional)
energy, forces, coord_num = dftd3(
    positions=positions,           # [num_atoms, 3] in Bohr
    numbers=numbers,               # [num_atoms] atomic numbers
    neighbor_list=neighbor_list_coo,  # [2, num_pairs]
    neighbor_ptr=neighbor_list_ptr,
    a1=0.3981, a2=4.4211, s8=0.7875,
    d3_params=d3_params,
)

Periodic Boundary Conditions#

For periodic systems, provide the cell and shift tensors:

Neighbor Matrix (Dense)

energy, forces, coord_num, virial = dftd3(
    positions=positions,
    numbers=numbers,
    neighbor_matrix=neighbor_matrix,
    neighbor_matrix_shifts=neighbor_matrix_shifts,   # [num_atoms, max_neighbors, 3]
    cell=cell,                                       # [num_systems, 3, 3]
    a1=0.3981, a2=4.4211, s8=0.7875,
    d3_params=d3_params,
    compute_virial=True                              # also compute virial
)

Neighbor List (Sparse COO)

energy, forces, coord_num = dftd3(
    positions=positions,
    numbers=numbers,
    neighbor_list=neighbor_list_coo,
    unit_shifts=unit_shifts,           # [num_pairs, 3]
    cell=cell,                         # [num_systems, 3, 3]
    a1=0.3981, a2=4.4211, s8=0.7875,
    d3_params=d3_params,
)

Use batch_idx to specify multi-system batches, mapping each atom to its system.

Data Formats#

Neighbor Representations#

ALCHEMI Toolkit-Ops supports two neighbor formats that produce identical results:

Neighbor Matrix (default) : Dense tensor of shape [num_atoms, max_neighbors]. Each row contains neighbor indices for that atom, padded with fill_value (typically num_atoms). Use with neighbor_matrix and neighbor_matrix_shifts arguments.

Neighbor List (COO) : Sparse tensor of shape [2, num_pairs] where row 0 contains source indices and row 1 contains target indices. No padding needed. Use with neighbor_list and unit_shifts arguments.

When to Use Each Format#

We refer the reader for a more in-depth discussion in the neighbor list documentation, but briefly:

Neighbor Matrix is preferred when:

Using torch.compile (fixed memory layout avoids graph breaks)
Systems have dense, uniform neighbor distributions

Neighbor List (COO) is preferred when:

Integrating with graph neural network libraries (PyG, DGL)
Systems are sparse with highly variable neighbors per atom
Memory efficiency is critical

Note

The neighbor representation should be symmetric (bidirectional): if atom j is a neighbor of atom i, then atom i should also be a neighbor of atom j.

Units#

While the DFT-D3 kernels themselves are unit-agnostic, the reference parameters themselves typically use atomic units. The table below lists quantities and their conversions from conventional units that are more commonly encountered in computational chemistry and materials science:

Quantity	Unit	Conversion from SI
Positions	Bohr	\((\text{Å} \times 1.8897259886)\)
Energy (output)	Hartree	\((\times 27.211 \rightarrow \text{eV})\)
Forces (output)	Hartree/Bohr	\((\times 51.422 \rightarrow \text{eV/Å})\)
Virial (output)	Hartree	\((\times 27.211 \rightarrow \text{eV})\)
`a2` parameter	Bohr	Same as positions
Cutoffs	Bohr	Same as positions
C6 coefficients	(Hartree \(\cdot\) Bohr\(^6\))	—

Coordination numbers are dimensionless.

Tip

Use scipy.constants for precise conversions: scipy.constants.value('atomic unit of length') for Bohr, scipy.constants.value('Hartree energy in eV') for Hartree.

Data Types#

Tensor	Dtype	Notes
Positions	`float32` or `float64`	FP64 used for distance vectors only
Cell	`float32` or `float64`	Same format as Positions
Atomic numbers	`int32`
Neighbor indices	`int32`
Energy (output)	`float32`	Always
Forces (output)	`float32`	Always
Virial (output)	`float32`	Always
Coordination numbers (output)	`float32`	Always

Parameter Setup#

DFT-D3 requires reference parameters that must be explicitly provided. Three options are available:

Option 1: D3Parameters Dataclass (Recommended)#

from nvalchemiops.interactions.dispersion.dftd3 import D3Parameters

d3_params = D3Parameters(
    rcov=covalent_radii,      # [max_Z+1] in Bohr
    r4r2=r4r2_values,         # [max_Z+1] <r^4>/<r^2> expectation values
    c6ab=c6_reference,        # [max_Z+1, max_Z+1, 5, 5] C6 grid
    cn_ref=coord_num_ref,     # [max_Z+1, max_Z+1, 5, 5] CN grid
)

energy, forces, coord_num = dftd3(
    positions, numbers, neighbor_matrix,
    a1=0.3981, a2=4.4211, s8=0.7875,
    d3_params=d3_params,
)

Option 2: Dictionary#

d3_params = {
    "rcov": covalent_radii,
    "r4r2": r4r2_values,
    "c6ab": c6_reference,
    "cn_ref": coord_num_ref,
}

Option 3: Individual Parameters#

energy, forces, coord_num = dftd3(
    positions, numbers, neighbor_matrix,
    a1=0.3981, a2=4.4211, s8=0.7875,
    covalent_radii=covalent_radii,
    r4r2=r4r2_values,
    c6_reference=c6_reference,
    coord_num_ref=coord_num_ref,
)

Obtaining Parameters#

Parameter files are available from the Grimme group website. See examples/interactions/utils.py for loading utilities.

Functional-Specific Damping Parameters#

Common BJ damping parameters (a1, a2, s8):

Functional	a1	a2 (Bohr)	s8
PBE	0.3981	4.4211	0.7875
PBE0	0.4145	4.8593	1.2177
B3LYP	0.3981	4.4211	1.9889

See the DFT-D3 parameters page for a complete list.

Performance Tuning#

Key Parameters#

s5_smoothing_on / s5_smoothing_off : Enable smooth cutoff with the S5 switching function. The function provides \(C^2\) continuity at both boundaries, preventing force discontinuities.

torch.compile Compatibility#

Both neighbor formats support torch.compile for JIT optimization:

compiled_dftd3 = torch.compile(dftd3)

energy, forces, coord_num = compiled_dftd3(
    positions=positions,
    numbers=numbers,
    neighbor_list=neighbor_list_coo,
    a1=0.3981, a2=4.4211, s8=0.7875,
    d3_params=d3_params,
    num_systems=num_systems  # will introduce CUDA graph break if not provided
)

Precision Notes#

Input positions and unit cell can be float64 for large unit cells or systems far from origin
Intermediate calculations use float32 (sufficient for dispersion accuracy)
All outputs are float32

Usage Patterns#

Single Molecule (Non-Periodic)#

import torch
from nvalchemiops.interactions.dispersion.dftd3 import dftd3
from nvalchemiops.neighborlist import neighbor_list

ANGSTROM_TO_BOHR = 1.8897259886
HARTREE_TO_EV = 27.211386245981

# Water molecule (Ångström → Bohr)
positions_ang = torch.tensor([
    [0.0000,  0.0000,  0.1173],  # O
    [0.0000,  0.7572, -0.4692],  # H
    [0.0000, -0.7572, -0.4692],  # H
], dtype=torch.float32, device="cuda")
positions = positions_ang * ANGSTROM_TO_BOHR

numbers = torch.tensor([8, 1, 1], dtype=torch.int32, device="cuda")

# Non-periodic: use large box with pbc=False
cell = torch.eye(3, dtype=torch.float32, device="cuda").unsqueeze(0) * 100.0
pbc = torch.tensor([False, False, False], device="cuda")

neighbor_matrix, num_neighbors, _ = neighbor_list(
    positions, cutoff=50.0, cell=cell, pbc=pbc
)

# d3_params loaded from file (see examples/interactions/utils.py)
energy, forces, coord_num = dftd3(
    positions=positions,
    numbers=numbers,
    neighbor_matrix=neighbor_matrix,
    a1=0.3981, a2=4.4211, s8=0.7875,
    d3_params=d3_params,
)

print(f"Dispersion energy: {energy[0].item():.6f} Ha")
print(f"                   {energy[0].item() * HARTREE_TO_EV:.6f} eV")

Batched Periodic Crystals#

import torch
from nvalchemiops.interactions.dispersion.dftd3 import dftd3
from nvalchemiops.neighborlist import neighbor_list

ANGSTROM_TO_BOHR = 1.8897259886

# Batch of 4 systems, 8 atoms each
BATCH_SIZE = 4
atoms_per_system = 8
total_atoms = BATCH_SIZE * atoms_per_system

positions_ang = torch.randn(total_atoms, 3, device="cuda")
positions = positions_ang * ANGSTROM_TO_BOHR
numbers = torch.randint(1, 20, (total_atoms,), dtype=torch.int32, device="cuda")

# Per-system cells (Å → Bohr)
cells = torch.eye(3, device="cuda").unsqueeze(0).repeat(BATCH_SIZE, 1, 1) * 10.0
cells = cells * ANGSTROM_TO_BOHR

pbc = torch.tensor([True, True, True], device="cuda").unsqueeze(0).repeat(BATCH_SIZE, 1)

batch_idx = torch.repeat_interleave(
    torch.arange(BATCH_SIZE, dtype=torch.int32, device="cuda"),
    atoms_per_system
)
batch_ptr = torch.arange(0, total_atoms + 1, atoms_per_system, dtype=torch.int32, device="cuda")

# Build neighbors
neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff=20.0 * ANGSTROM_TO_BOHR,
    cell=cells, pbc=pbc, batch_idx=batch_idx, batch_ptr=batch_ptr,
    method="batch_cell_list"
)

energy, forces, coord_num, virial = dftd3(
    positions=positions,
    numbers=numbers,
    neighbor_matrix=neighbor_matrix,
    neighbor_matrix_shifts=shifts,
    cell=cells,
    batch_idx=batch_idx,
    a1=0.4145, a2=4.8593, s8=1.2177,  # PBE0
    d3_params=d3_params,
    compute_virial=True
)

print(f"Energies per system (Ha): {energy}")

Smooth Cutoff#

energy, forces, coord_num = dftd3(
    positions=positions,
    numbers=numbers,
    neighbor_matrix=neighbor_matrix,
    a1=0.3981, a2=4.4211, s8=0.7875,
    d3_params=d3_params,
    s5_smoothing_on=40.0,   # Start transition at 40 Bohr
    s5_smoothing_off=50.0,  # Complete cutoff at 50 Bohr
)

Theory Background#

This section summarizes the DFT-D3 method [1] with Becke-Johnson damping [2]. For complete derivations and benchmarks, see the original publications.

Dispersion Energy Expression#

The DFT-D3 energy with Becke-Johnson damping:

\[E_{\text{disp}} = -\frac{1}{2} \sum_{i \neq j} S_w(r_{ij}) \left[ \frac{s_6 C_6^{ij}}{r_{ij}^6 + f_{\text{damp},6}^6} + \frac{s_8 C_8^{ij}}{r_{ij}^8 + f_{\text{damp},8}^8} \right]\]

where:

\(r_{ij}\): interatomic distance
\(C_6^{ij}, C_8^{ij}\): dispersion coefficients
\(s_6, s_8\): functional-dependent scaling factors
\(f_{\text{damp}}\): Becke-Johnson damping function
\(S_w(r)\): optional smooth switching function

Coordination-Dependent C6#

A key innovation in DFT-D3 [1] is environment-dependent C6 coefficients. The C6 coefficient adapts to the local chemical environment via Gaussian interpolation over reference coordination numbers:

\[C_6(\text{CN}_i, \text{CN}_j) = \frac{\sum_{pq} C_6^{\text{ref}}[p,q] \cdot L_{pq}}{\sum_{pq} L_{pq}}\]

Coordination numbers use a geometric counting function:

\[\text{CN}_i = \sum_{j \neq i} \frac{1}{1 + \exp\left[k_1 \left(\frac{r_{\text{cov}}}{r_{ij}} - 1\right)\right]}\]

Becke-Johnson Damping#

The BJ damping function [2] prevents unphysical short-range behavior while providing improved accuracy for equilibrium geometries:

\[f_{\text{damp}} = a_1 \sqrt{C_8^{ij}/C_6^{ij}} + a_2\]

Force Calculation#

Forces include both direct and coordination-dependent contributions via chain rule:

\[\mathbf{F}_i = -\sum_j \left[\left.\frac{\partial E_{ij}}{\partial \mathbf{r}_i} \right|_{\text{CN}} + \left(\sum_k \frac{\partial E_{ik}}{\partial \text{CN}_i}\right) \frac{\partial \text{CN}_i}{\partial \mathbf{r}_i}\right]\]

This decomposition into direct and chain-rule terms is central to the multi-pass kernel architecture described in Implementation Details.

Implementation Details#

Why Multi-Pass Kernels?#

The force calculation requires computing the chain-rule contribution:

\[\mathbf{F}_{\text{chain},i} = -\left(\sum_k \frac{\partial E_{ik}}{\partial \text{CN}_i}\right) \frac{\partial \text{CN}_i}{\partial \mathbf{r}_i}\]

The term \(\sum_k \partial E_{ik}/\partial \text{CN}_i\) must be accumulated over all pairs involving atom \(i\) before computing the CN-dependent forces. This creates a data dependency that prevents computing direct and chain-rule forces in a single pass. The multi-pass architecture separates these computations:

Pass 0 (PBC only) : Convert integer unit cell shifts to Cartesian coordinates using the lattice vectors. This is performed once and reused in subsequent passes.

Pass 1 : Compute coordination numbers \(\text{CN}_i\) for all atoms using the geometric counting function. These are needed for C6 interpolation in Pass 2.

Pass 2 : For each atom pair: interpolate C6 coefficients, compute damped dispersion energy, compute direct forces \(\mathbf{F}_{\text{direct}}\), and accumulate \(\partial E/\partial \text{CN}_i\) into a per-atom buffer.

Pass 3 : Using the accumulated \(\partial E/\partial \text{CN}\) values from Pass 2, compute the chain-rule force contribution and add it to the direct forces.

Neighbor Format Dispatch#

The dftd3() function dispatches to different kernel implementations based on the neighbor representation format:

Neighbor matrix (neighbor_matrix argument): Dispatches to _dftd3_nm_op, which launches kernels that iterate over a dense [num_atoms, max_neighbors] array.
Neighbor list (neighbor_list + neighbor_ptr arguments): Dispatches to _dftd3_nl_op, which launches kernels that use CSR (Compressed Sparse Row) format for memory-efficient sparse traversal.

Both paths execute the same four-pass algorithm and produce identical results. The choice affects memory layout and access patterns but not numerical output.

Precision Handling#

The kernels support both float32 and float64 input positions through Warp’s overload mechanism. At module load time, kernel overloads are registered for each precision:

float32 positions use wp.vec3f vectors and wp.mat33f matrices
float64 positions use wp.vec3d vectors and wp.mat33d matrices

Distance vectors are computed at the input precision, but all dispersion calculations (C6 interpolation, damping, energy/force accumulation) use float32 for efficiency. Outputs are always float32.

Numerical Stability#

Log-sum-exp trick: The Gaussian-weighted C6 interpolation computes \(\sum_k w_k C_6^{(k)} / \sum_k w_k\) where \(w_k = \exp(\text{arg}_k)\). To prevent overflow, all exponentials are computed relative to the maximum argument: \(\exp(\text{arg}_k - \text{arg}_{\max})\).
Threshold skipping: Contributions with \(\exp(\text{arg}) < 10^{-12}\) (relative to max) are skipped to avoid unnecessary computation.