DFT-D3(BJ) Dispersion Correction#
Dispersion corrections account for van der Waals interactions that standard DFT
functionals underestimate. ALCHEMI Toolkit-Ops provides GPU-accelerated DFT-D3
with Becke-Johnson damping via NVIDIA Warp,
supporting batched computation, periodic systems, and full torch.compile
compatibility.
Tip
The current implementation computes two-body terms only (C6 and C8). Three-body Axilrod-Teller-Muto (ATM/C9) contributions are not included.
Why Dispersion Correction Matters#
Standard DFT functionals systematically underestimate long-range dispersion due to their local or semi-local nature. This leads to significant errors in:
Binding energies of weakly bound complexes
Molecular crystal structures and lattice energies
Conformational energies of flexible molecules
Adsorption energies on surfaces
DFT-D3 adds an empirical pairwise correction with environment-dependent C6 coefficients that adapt based on coordination numbers. The Becke-Johnson damping variant provides improved short-range behavior for molecular geometries.
Quick Start#
Important
DFT-D3 parameters must be explicitly provided via d3_params (a
D3Parameters instance or
dictionary). See Parameter Setup for details.
Unit consistency is required: Standard D3 parameters use atomic units— Bohr for lengths, Hartree for energies. Unit mismatches may cause the neighbor list calculation to run out of memory (e.g. cutoff in Bohr, but positions are in Angstroms) or yield an unconverged estimate of the dispersion corrections (i.e. cutoff in Angstrom, positions in Bohr). See Units for guidance on units for each parameter.
from nvalchemiops.interactions.dispersion.dftd3 import dftd3
from nvalchemiops.neighborlist import neighbor_list
# Build neighbor matrix; 50 Bohr cutoff
neighbor_matrix, num_neighbors, shifts = neighbor_list(
positions, cutoff=50.0, cell=cell, pbc=pbc
)
# Compute dispersion correction (PBE functional)
energy, forces, coord_num = dftd3(
positions=positions, # [num_atoms, 3] in Bohr
numbers=numbers, # [num_atoms] atomic numbers
neighbor_matrix=neighbor_matrix,
a1=0.3981, a2=4.4211, s8=0.7875,
d3_params=d3_params,
)
from nvalchemiops.interactions.dispersion.dftd3 import dftd3
from nvalchemiops.neighborlist import neighbor_list
# Build neighbor list in COO format; 50 Bohr cutoff
neighbor_list_coo, neighbor_list_ptr, unit_shifts = neighbor_list(
positions, cutoff=50.0, cell=cell, pbc=pbc, return_neighbor_list=True
)
# Compute dispersion correction (PBE functional)
energy, forces, coord_num = dftd3(
positions=positions, # [num_atoms, 3] in Bohr
numbers=numbers, # [num_atoms] atomic numbers
neighbor_list=neighbor_list_coo, # [2, num_pairs]
neighbor_ptr=neighbor_list_ptr,
a1=0.3981, a2=4.4211, s8=0.7875,
d3_params=d3_params,
)
Periodic Boundary Conditions#
For periodic systems, provide the cell and shift tensors:
energy, forces, coord_num, virial = dftd3(
positions=positions,
numbers=numbers,
neighbor_matrix=neighbor_matrix,
neighbor_matrix_shifts=neighbor_matrix_shifts, # [num_atoms, max_neighbors, 3]
cell=cell, # [num_systems, 3, 3]
a1=0.3981, a2=4.4211, s8=0.7875,
d3_params=d3_params,
compute_virial=True # also compute virial
)
energy, forces, coord_num = dftd3(
positions=positions,
numbers=numbers,
neighbor_list=neighbor_list_coo,
unit_shifts=unit_shifts, # [num_pairs, 3]
cell=cell, # [num_systems, 3, 3]
a1=0.3981, a2=4.4211, s8=0.7875,
d3_params=d3_params,
)
Use batch_idx to specify multi-system batches, mapping each atom to its system.
Data Formats#
Neighbor Representations#
ALCHEMI Toolkit-Ops supports two neighbor formats that produce identical results:
Neighbor Matrix (default)
: Dense tensor of shape [num_atoms, max_neighbors]. Each row contains neighbor
indices for that atom, padded with fill_value (typically num_atoms).
Use with neighbor_matrix and neighbor_matrix_shifts arguments.
Neighbor List (COO)
: Sparse tensor of shape [2, num_pairs] where row 0 contains source indices
and row 1 contains target indices. No padding needed.
Use with neighbor_list and unit_shifts arguments.
When to Use Each Format#
We refer the reader for a more in-depth discussion in the neighbor list documentation, but briefly:
Neighbor Matrix is preferred when:
Using
torch.compile(fixed memory layout avoids graph breaks)Systems have dense, uniform neighbor distributions
Neighbor List (COO) is preferred when:
Integrating with graph neural network libraries (PyG, DGL)
Systems are sparse with highly variable neighbors per atom
Memory efficiency is critical
Note
The neighbor representation should be symmetric (bidirectional): if atom j
is a neighbor of atom i, then atom i should also be a neighbor of atom j.
Units#
While the DFT-D3 kernels themselves are unit-agnostic, the reference parameters themselves typically use atomic units. The table below lists quantities and their conversions from conventional units that are more commonly encountered in computational chemistry and materials science:
Quantity |
Unit |
Conversion from SI |
|---|---|---|
Positions |
Bohr |
\((\text{Å} \times 1.8897259886)\) |
Energy (output) |
Hartree |
\((\times 27.211 \rightarrow \text{eV})\) |
Forces (output) |
Hartree/Bohr |
\((\times 51.422 \rightarrow \text{eV/Å})\) |
Virial (output) |
Hartree |
\((\times 27.211 \rightarrow \text{eV})\) |
|
Bohr |
Same as positions |
Cutoffs |
Bohr |
Same as positions |
C6 coefficients |
(Hartree \(\cdot\) Bohr\(^6\)) |
— |
Coordination numbers are dimensionless.
Tip
Use scipy.constants for precise conversions:
scipy.constants.value('atomic unit of length') for Bohr,
scipy.constants.value('Hartree energy in eV') for Hartree.
Data Types#
Tensor |
Dtype |
Notes |
|---|---|---|
Positions |
|
FP64 used for distance vectors only |
Cell |
|
Same format as Positions |
Atomic numbers |
|
|
Neighbor indices |
|
|
Energy (output) |
|
Always |
Forces (output) |
|
Always |
Virial (output) |
|
Always |
Coordination numbers (output) |
|
Always |
Parameter Setup#
DFT-D3 requires reference parameters that must be explicitly provided. Three options are available:
Option 1: D3Parameters Dataclass (Recommended)#
from nvalchemiops.interactions.dispersion.dftd3 import D3Parameters
d3_params = D3Parameters(
rcov=covalent_radii, # [max_Z+1] in Bohr
r4r2=r4r2_values, # [max_Z+1] <r^4>/<r^2> expectation values
c6ab=c6_reference, # [max_Z+1, max_Z+1, 5, 5] C6 grid
cn_ref=coord_num_ref, # [max_Z+1, max_Z+1, 5, 5] CN grid
)
energy, forces, coord_num = dftd3(
positions, numbers, neighbor_matrix,
a1=0.3981, a2=4.4211, s8=0.7875,
d3_params=d3_params,
)
Option 2: Dictionary#
d3_params = {
"rcov": covalent_radii,
"r4r2": r4r2_values,
"c6ab": c6_reference,
"cn_ref": coord_num_ref,
}
Option 3: Individual Parameters#
energy, forces, coord_num = dftd3(
positions, numbers, neighbor_matrix,
a1=0.3981, a2=4.4211, s8=0.7875,
covalent_radii=covalent_radii,
r4r2=r4r2_values,
c6_reference=c6_reference,
coord_num_ref=coord_num_ref,
)
Obtaining Parameters#
Parameter files are available from the
Grimme group website.
See examples/interactions/utils.py for loading utilities.
Functional-Specific Damping Parameters#
Common BJ damping parameters (a1, a2, s8):
Functional |
a1 |
a2 (Bohr) |
s8 |
|---|---|---|---|
PBE |
0.3981 |
4.4211 |
0.7875 |
PBE0 |
0.4145 |
4.8593 |
1.2177 |
B3LYP |
0.3981 |
4.4211 |
1.9889 |
See the DFT-D3 parameters page for a complete list.
Performance Tuning#
Key Parameters#
s5_smoothing_on / s5_smoothing_off
: Enable smooth cutoff with the S5 switching function. The function provides
\(C^2\) continuity at both boundaries, preventing force discontinuities.
torch.compile Compatibility#
Both neighbor formats support torch.compile for JIT optimization:
compiled_dftd3 = torch.compile(dftd3)
energy, forces, coord_num = compiled_dftd3(
positions=positions,
numbers=numbers,
neighbor_list=neighbor_list_coo,
a1=0.3981, a2=4.4211, s8=0.7875,
d3_params=d3_params,
num_systems=num_systems # will introduce CUDA graph break if not provided
)
Precision Notes#
Input positions and unit cell can be
float64for large unit cells or systems far from originIntermediate calculations use
float32(sufficient for dispersion accuracy)All outputs are
float32
Usage Patterns#
Single Molecule (Non-Periodic)#
import torch
from nvalchemiops.interactions.dispersion.dftd3 import dftd3
from nvalchemiops.neighborlist import neighbor_list
ANGSTROM_TO_BOHR = 1.8897259886
HARTREE_TO_EV = 27.211386245981
# Water molecule (Ångström → Bohr)
positions_ang = torch.tensor([
[0.0000, 0.0000, 0.1173], # O
[0.0000, 0.7572, -0.4692], # H
[0.0000, -0.7572, -0.4692], # H
], dtype=torch.float32, device="cuda")
positions = positions_ang * ANGSTROM_TO_BOHR
numbers = torch.tensor([8, 1, 1], dtype=torch.int32, device="cuda")
# Non-periodic: use large box with pbc=False
cell = torch.eye(3, dtype=torch.float32, device="cuda").unsqueeze(0) * 100.0
pbc = torch.tensor([False, False, False], device="cuda")
neighbor_matrix, num_neighbors, _ = neighbor_list(
positions, cutoff=50.0, cell=cell, pbc=pbc
)
# d3_params loaded from file (see examples/interactions/utils.py)
energy, forces, coord_num = dftd3(
positions=positions,
numbers=numbers,
neighbor_matrix=neighbor_matrix,
a1=0.3981, a2=4.4211, s8=0.7875,
d3_params=d3_params,
)
print(f"Dispersion energy: {energy[0].item():.6f} Ha")
print(f" {energy[0].item() * HARTREE_TO_EV:.6f} eV")
Batched Periodic Crystals#
import torch
from nvalchemiops.interactions.dispersion.dftd3 import dftd3
from nvalchemiops.neighborlist import neighbor_list
ANGSTROM_TO_BOHR = 1.8897259886
# Batch of 4 systems, 8 atoms each
BATCH_SIZE = 4
atoms_per_system = 8
total_atoms = BATCH_SIZE * atoms_per_system
positions_ang = torch.randn(total_atoms, 3, device="cuda")
positions = positions_ang * ANGSTROM_TO_BOHR
numbers = torch.randint(1, 20, (total_atoms,), dtype=torch.int32, device="cuda")
# Per-system cells (Å → Bohr)
cells = torch.eye(3, device="cuda").unsqueeze(0).repeat(BATCH_SIZE, 1, 1) * 10.0
cells = cells * ANGSTROM_TO_BOHR
pbc = torch.tensor([True, True, True], device="cuda").unsqueeze(0).repeat(BATCH_SIZE, 1)
batch_idx = torch.repeat_interleave(
torch.arange(BATCH_SIZE, dtype=torch.int32, device="cuda"),
atoms_per_system
)
batch_ptr = torch.arange(0, total_atoms + 1, atoms_per_system, dtype=torch.int32, device="cuda")
# Build neighbors
neighbor_matrix, num_neighbors, shifts = neighbor_list(
positions, cutoff=20.0 * ANGSTROM_TO_BOHR,
cell=cells, pbc=pbc, batch_idx=batch_idx, batch_ptr=batch_ptr,
method="batch_cell_list"
)
energy, forces, coord_num, virial = dftd3(
positions=positions,
numbers=numbers,
neighbor_matrix=neighbor_matrix,
neighbor_matrix_shifts=shifts,
cell=cells,
batch_idx=batch_idx,
a1=0.4145, a2=4.8593, s8=1.2177, # PBE0
d3_params=d3_params,
compute_virial=True
)
print(f"Energies per system (Ha): {energy}")
Smooth Cutoff#
energy, forces, coord_num = dftd3(
positions=positions,
numbers=numbers,
neighbor_matrix=neighbor_matrix,
a1=0.3981, a2=4.4211, s8=0.7875,
d3_params=d3_params,
s5_smoothing_on=40.0, # Start transition at 40 Bohr
s5_smoothing_off=50.0, # Complete cutoff at 50 Bohr
)
Theory Background#
This section summarizes the DFT-D3 method [1] with Becke-Johnson damping [2]. For complete derivations and benchmarks, see the original publications.
Dispersion Energy Expression#
The DFT-D3 energy with Becke-Johnson damping:
where:
\(r_{ij}\): interatomic distance
\(C_6^{ij}, C_8^{ij}\): dispersion coefficients
\(s_6, s_8\): functional-dependent scaling factors
\(f_{\text{damp}}\): Becke-Johnson damping function
\(S_w(r)\): optional smooth switching function
Coordination-Dependent C6#
A key innovation in DFT-D3 [1] is environment-dependent C6 coefficients. The C6 coefficient adapts to the local chemical environment via Gaussian interpolation over reference coordination numbers:
Coordination numbers use a geometric counting function:
Becke-Johnson Damping#
The BJ damping function [2] prevents unphysical short-range behavior while providing improved accuracy for equilibrium geometries:
Force Calculation#
Forces include both direct and coordination-dependent contributions via chain rule:
This decomposition into direct and chain-rule terms is central to the multi-pass kernel architecture described in Implementation Details.
Implementation Details#
Why Multi-Pass Kernels?#
The force calculation requires computing the chain-rule contribution:
The term \(\sum_k \partial E_{ik}/\partial \text{CN}_i\) must be accumulated over all pairs involving atom \(i\) before computing the CN-dependent forces. This creates a data dependency that prevents computing direct and chain-rule forces in a single pass. The multi-pass architecture separates these computations:
Pass 0 (PBC only) : Convert integer unit cell shifts to Cartesian coordinates using the lattice vectors. This is performed once and reused in subsequent passes.
Pass 1 : Compute coordination numbers \(\text{CN}_i\) for all atoms using the geometric counting function. These are needed for C6 interpolation in Pass 2.
Pass 2 : For each atom pair: interpolate C6 coefficients, compute damped dispersion energy, compute direct forces \(\mathbf{F}_{\text{direct}}\), and accumulate \(\partial E/\partial \text{CN}_i\) into a per-atom buffer.
Pass 3 : Using the accumulated \(\partial E/\partial \text{CN}\) values from Pass 2, compute the chain-rule force contribution and add it to the direct forces.
Neighbor Format Dispatch#
The dftd3()
function dispatches to different kernel implementations based on the neighbor
representation format:
Neighbor matrix (
neighbor_matrixargument): Dispatches to_dftd3_nm_op, which launches kernels that iterate over a dense[num_atoms, max_neighbors]array.Neighbor list (
neighbor_list+neighbor_ptrarguments): Dispatches to_dftd3_nl_op, which launches kernels that use CSR (Compressed Sparse Row) format for memory-efficient sparse traversal.
Both paths execute the same four-pass algorithm and produce identical results. The choice affects memory layout and access patterns but not numerical output.
Precision Handling#
The kernels support both float32 and float64 input positions through Warp’s
overload mechanism. At module load time, kernel overloads are registered for
each precision:
float32positions usewp.vec3fvectors andwp.mat33fmatricesfloat64positions usewp.vec3dvectors andwp.mat33dmatrices
Distance vectors are computed at the input precision, but all dispersion
calculations (C6 interpolation, damping, energy/force accumulation) use
float32 for efficiency. Outputs are always float32.
Numerical Stability#
Log-sum-exp trick: The Gaussian-weighted C6 interpolation computes \(\sum_k w_k C_6^{(k)} / \sum_k w_k\) where \(w_k = \exp(\text{arg}_k)\). To prevent overflow, all exponentials are computed relative to the maximum argument: \(\exp(\text{arg}_k - \text{arg}_{\max})\).
Threshold skipping: Contributions with \(\exp(\text{arg}) < 10^{-12}\) (relative to max) are skipped to avoid unnecessary computation.