Dynamics Benchmarks#

This page presents benchmark results for molecular dynamics (MD) integrators and geometry optimization methods using the nvalchemiops GPU-accelerated implementations. Results show scaling behavior for both single-system and batched simulations across different system sizes using Lennard-Jones argon systems.

Warning

These results are intended to be indicative only: your actual performance may vary depending on the atomic system topology, software and hardware configuration and we encourage users to benchmark on their own systems of interest.

How to Read These Charts#

Time Scaling : Average time per MD/optimization step (ms) vs. system size. Lower is better. For batched runs, this is the time to process all systems in the batch.

Throughput : Atom-steps processed per second. Higher is better. For batched systems, this represents the total number of atoms across all systems in the batch multiplied by the number of steps per second.

Ensemble : MD ensemble type - NVE (constant energy), NVT (constant temperature), NPT (constant pressure-temperature), or NPH (constant pressure-enthalpy).

Batch Size : Number of independent systems processed simultaneously. Batch size of 1 represents single-system mode.

Molecular Dynamics (MD)#

GPU-accelerated MD integrators using NVIDIA Warp kernels with optimized neighbor lists. Supports various ensembles including microcanonical (NVE), canonical (NVT), and isobaric-isothermal (NPT).

Single-System MD#

Performance for single molecular dynamics systems showing how throughput scales with system size.

Time Scaling#

MD single-system time scaling

Average step time vs. system size for single-system MD integrators.#

Throughput#

MD single-system throughput

Throughput (atom-steps/s) for single-system MD integrators.#

Batched MD#

Performance for batched MD simulations showing how throughput scales with both system size and batch size. Batching enables efficient parameter sweeps and ensemble simulations.

Time Scaling#

MD batched scaling

Average step time for batched MD simulations showing batch size scaling.#

Throughput#

MD batched throughput

Total throughput (atom-steps/s) for batched MD systems.#

Available Integrators#

Velocity Verlet (NVE) : Symplectic integrator that conserves total energy. Excellent stability for constant energy simulations. Standard choice for microcanonical ensemble.

Langevin (NVT) : Stochastic dynamics using the BAOAB splitting scheme for accurate temperature control. Maintains canonical ensemble through friction and random forces.

Nose-Hoover Chain (NVT) : Deterministic thermostat using extended system variables. Provides rigorous canonical sampling without stochastic forces.

NPT Integrator : Isobaric-isothermal ensemble allowing cell fluctuations to maintain constant pressure and temperature. Uses Nose-Hoover chains for temperature control and barostat for pressure control.

NPH Integrator : Isobaric-enthalpic ensemble with constant pressure. Similar to NPT but without temperature control.

Geometry Optimization#

GPU-accelerated FIRE and FIRE2 (Fast Inertial Relaxation Engine) optimizers for efficient energy minimization. Both adapt timestep and velocity-force mixing for robust convergence on diverse energy landscapes. FIRE2 (Guénolé et al., 2020) introduces a deferred half-step and modified velocity mixing for improved convergence behavior.

Single-System Optimization#

Performance for single-system geometry optimization showing convergence speed and computational efficiency.

Time Scaling#

Optimization single-system scaling

Average step time vs. system size for FIRE optimizer.#

Throughput#

Optimization single-system throughput

Throughput (atom-steps/s) during geometry optimization.#

Batched Optimization#

Performance for batched optimization showing how multiple structures can be relaxed simultaneously for efficient saddle point searches, transition state finding, or structural screening.

Time Scaling#

Optimization batched scaling

Average step time for batched FIRE optimization.#

Throughput#

Optimization batched throughput

Total throughput (atom-steps/s) for batched optimization.#

FIRE Algorithm Features#

Adaptive Timestep:

  • Increases timestep when optimization is progressing smoothly (power P = F · v > 0)

  • Decreases timestep and resets velocities when moving uphill (P < 0)

  • Parameters: dt_max (10.0 fs), f_inc (1.1), f_dec (0.5)

Velocity Mixing:

  • Mixes velocity with force direction: v → (1-α)v + α|v|F̂

  • Decreases mixing parameter α over time for faster convergence

  • Parameter: f_alpha (0.99)

Maximum Displacement:

  • Limits atomic displacement per step to prevent instability: maxstep (0.2 Å)

Convergence:

  • Checks maximum force component: max(|F|) < fmax (default 0.01 eV/Å)

Hardware Information#

GPU: NVIDIA H100 80GB HBM3

Benchmark Configuration#

System Setup#

Parameter

Value

System Type

FCC argon lattice with periodic boundaries

Lattice Constant

5.26 Å (argon)

Temperature

300 K

Potential

Lennard-Jones (ε = 0.0104 eV, σ = 3.40 Å)

Cutoff Distance

8.5 Å

Neighbor List

Cell list algorithm with skin distance 1.0 Å

Rebuild Interval

Every 10 steps (or displacement-based)

MD Parameters#

Parameter

Value

Timestep

1.0 fs (0.001 time units)

Total Steps

10,000

Warmup Steps

100 (excluded from timing)

Langevin Friction

0.01 fs⁻¹

NPT Pressure

1.0 bar

NPT Barostat Mass

75.0 (time units²)

Optimization Parameters#

Parameter

Value

Max Steps

1,000

Force Tolerance

0.01 eV/Å

Initial Perturbation

Gaussian (σ = 0.15 Å for batched, 0.1 Å for single)

dt_start

1.0 fs

dt_max

10.0 fs

maxstep

0.2 Å

System Sizes#

Single-System Benchmarks:

  • MD: 256, 512, 1024, 2048, 4096 atoms

  • Optimization: 256, 512, 1024, 2048 atoms

Batched Benchmarks:

  • System sizes: 256, 512, 1024 atoms per system

  • Batch sizes: 1, 2, 4, 8, 16, 32 systems

Running Your Own Benchmarks#

To reproduce these benchmarks or test on your own hardware:

Single-System MD#

cd benchmarks/dynamics
python benchmark_md_single.py --config benchmark_config.yaml

Batched MD#

python benchmark_md_batch.py --config benchmark_config.yaml

Single-System Optimization#

python benchmark_opt_single.py --config benchmark_config.yaml

Batched Optimization#

python benchmark_opt_batch.py --config benchmark_config.yaml

FIRE1 vs FIRE2 Comparison#

Full optimization runs comparing FIRE1 and FIRE2 convergence and wall-clock time on fixed-cell and variable-cell LJ systems:

python benchmark_fire_compare.py --config benchmark_config.yaml --output-dir ./benchmark_results

FIRE2 Kernel Performance#

Raw per-step GPU kernel timing using CUDA events, sweeping total atoms and batch sizes across float32 and float64:

python benchmark_fire2.py --config benchmark_config.yaml --output-dir ./benchmark_results

Configuration File#

Edit benchmark_config.yaml to customize benchmarks:

# MD single-system
md_single:
  enabled: true
  system_sizes: [256, 512, 1024, 2048, 4096]
  integrators:
    velocity_verlet:
      steps: 10000
      dt: 0.001  # fs
      warmup_steps: 100
    langevin:
      steps: 10000
      dt: 0.001
      temperature: 300.0  # K
      friction: 0.01  # 1/fs

# MD batched
md_batch:
  enabled: true
  system_sizes: [256, 512, 1024]
  batch_sizes: [1, 2, 4, 8, 16, 32]
  integrators:
    velocity_verlet:
      steps: 10000
      dt: 0.001
      warmup_steps: 100

# Optimization single-system
opt_single:
  enabled: true
  system_sizes: [256, 512, 1024, 2048]
  optimizers:
    fire:
      max_steps: 1000
      force_tolerance: 0.01  # eV/Å

# Optimization batched
opt_batch:
  enabled: true
  system_sizes: [256, 512]
  batch_sizes: [1, 2, 4, 8, 16]
  optimizers:
    fire:
      max_steps: 1000
      force_tolerance: 0.01

# Potential parameters
potential:
  epsilon: 0.0104  # eV
  sigma: 3.40  # Å
  cutoff: 8.5  # Å
  skin: 1.0  # Å
  neighbor_rebuild_interval: 10

Output#

Results are saved as CSV files in docs/benchmarks/benchmark_results/:

  • dynamics_md_single_nvalchemiops_<gpu_sku>.csv

  • dynamics_md_batch_nvalchemiops_<gpu_sku>.csv

  • dynamics_opt_single_nvalchemiops_<gpu_sku>.csv

  • dynamics_opt_batch_nvalchemiops_<gpu_sku>.csv

  • fire_compare_<gpu_sku>.csv

  • fire2_kernel_benchmark_<gpu_sku>.csv

Generate plots with:

cd docs/benchmarks
python generate_plots.py