# Dynamics Benchmarks This page presents benchmark results for molecular dynamics (MD) integrators and geometry optimization methods using the nvalchemiops GPU-accelerated implementations. Results show scaling behavior for both single-system and batched simulations across different system sizes using Lennard-Jones argon systems. ```{warning} These results are intended to be indicative _only_: your actual performance may vary depending on the atomic system topology, software and hardware configuration and we encourage users to benchmark on their own systems of interest. ``` ## How to Read These Charts Time Scaling : Average time per MD/optimization step (ms) vs. system size. Lower is better. For batched runs, this is the time to process all systems in the batch. Throughput : Atom-steps processed per second. Higher is better. For batched systems, this represents the total number of atoms across all systems in the batch multiplied by the number of steps per second. Ensemble : MD ensemble type - NVE (constant energy), NVT (constant temperature), NPT (constant pressure-temperature), or NPH (constant pressure-enthalpy). Batch Size : Number of independent systems processed simultaneously. Batch size of 1 represents single-system mode. ## Molecular Dynamics (MD) GPU-accelerated MD integrators using NVIDIA Warp kernels with optimized neighbor lists. Supports various ensembles including microcanonical (NVE), canonical (NVT), and isobaric-isothermal (NPT). ### Single-System MD Performance for single molecular dynamics systems showing how throughput scales with system size. #### Time Scaling ```{figure} _static/dynamics_md_single_nvalchemiops_scaling_h100.png :width: 90% :align: center :alt: MD single-system time scaling Average step time vs. system size for single-system MD integrators. ``` #### Throughput ```{figure} _static/dynamics_md_single_nvalchemiops_throughput_h100.png :width: 90% :align: center :alt: MD single-system throughput Throughput (atom-steps/s) for single-system MD integrators. ``` ### Batched MD Performance for batched MD simulations showing how throughput scales with both system size and batch size. Batching enables efficient parameter sweeps and ensemble simulations. #### Time Scaling ```{figure} _static/dynamics_md_batch_nvalchemiops_scaling_h100.png :width: 90% :align: center :alt: MD batched scaling Average step time for batched MD simulations showing batch size scaling. ``` #### Throughput ```{figure} _static/dynamics_md_batch_nvalchemiops_throughput_h100.png :width: 90% :align: center :alt: MD batched throughput Total throughput (atom-steps/s) for batched MD systems. ``` ### Available Integrators Velocity Verlet (NVE) : Symplectic integrator that conserves total energy. Excellent stability for constant energy simulations. Standard choice for microcanonical ensemble. Langevin (NVT) : Stochastic dynamics using the BAOAB splitting scheme for accurate temperature control. Maintains canonical ensemble through friction and random forces. Nose-Hoover Chain (NVT) : Deterministic thermostat using extended system variables. Provides rigorous canonical sampling without stochastic forces. NPT Integrator : Isobaric-isothermal ensemble allowing cell fluctuations to maintain constant pressure and temperature. Uses Nose-Hoover chains for temperature control and barostat for pressure control. NPH Integrator : Isobaric-enthalpic ensemble with constant pressure. Similar to NPT but without temperature control. ## Geometry Optimization GPU-accelerated FIRE and FIRE2 (Fast Inertial Relaxation Engine) optimizers for efficient energy minimization. Both adapt timestep and velocity-force mixing for robust convergence on diverse energy landscapes. FIRE2 (Guénolé et al., 2020) introduces a deferred half-step and modified velocity mixing for improved convergence behavior. ### Single-System Optimization Performance for single-system geometry optimization showing convergence speed and computational efficiency. #### Time Scaling ```{figure} _static/dynamics_opt_single_nvalchemiops_scaling_h100.png :width: 90% :align: center :alt: Optimization single-system scaling Average step time vs. system size for FIRE optimizer. ``` #### Throughput ```{figure} _static/dynamics_opt_single_nvalchemiops_throughput_h100.png :width: 90% :align: center :alt: Optimization single-system throughput Throughput (atom-steps/s) during geometry optimization. ``` ### Batched Optimization Performance for batched optimization showing how multiple structures can be relaxed simultaneously for efficient saddle point searches, transition state finding, or structural screening. #### Time Scaling ```{figure} _static/dynamics_opt_batch_nvalchemiops_scaling_h100.png :width: 90% :align: center :alt: Optimization batched scaling Average step time for batched FIRE optimization. ``` #### Throughput ```{figure} _static/dynamics_opt_batch_nvalchemiops_throughput_h100.png :width: 90% :align: center :alt: Optimization batched throughput Total throughput (atom-steps/s) for batched optimization. ``` ### FIRE Algorithm Features **Adaptive Timestep:** - Increases timestep when optimization is progressing smoothly (power P = F · v > 0) - Decreases timestep and resets velocities when moving uphill (P < 0) - Parameters: `dt_max` (10.0 fs), `f_inc` (1.1), `f_dec` (0.5) **Velocity Mixing:** - Mixes velocity with force direction: v → (1-α)v + α|v|F̂ - Decreases mixing parameter α over time for faster convergence - Parameter: `f_alpha` (0.99) **Maximum Displacement:** - Limits atomic displacement per step to prevent instability: `maxstep` (0.2 Å) **Convergence:** - Checks maximum force component: max(|F|) < fmax (default 0.01 eV/Å) ## Hardware Information **GPU**: NVIDIA H100 80GB HBM3 ## Benchmark Configuration ### System Setup | Parameter | Value | | --------- | ----- | | System Type | FCC argon lattice with periodic boundaries | | Lattice Constant | 5.26 Å (argon) | | Temperature | 300 K | | Potential | Lennard-Jones (ε = 0.0104 eV, σ = 3.40 Å) | | Cutoff Distance | 8.5 Å | | Neighbor List | Cell list algorithm with skin distance 1.0 Å | | Rebuild Interval | Every 10 steps (or displacement-based) | ### MD Parameters | Parameter | Value | | --------- | ----- | | Timestep | 1.0 fs (0.001 time units) | | Total Steps | 10,000 | | Warmup Steps | 100 (excluded from timing) | | Langevin Friction | 0.01 fs⁻¹ | | NPT Pressure | 1.0 bar | | NPT Barostat Mass | 75.0 (time units²) | ### Optimization Parameters | Parameter | Value | | --------- | ----- | | Max Steps | 1,000 | | Force Tolerance | 0.01 eV/Å | | Initial Perturbation | Gaussian (σ = 0.15 Å for batched, 0.1 Å for single) | | dt_start | 1.0 fs | | dt_max | 10.0 fs | | maxstep | 0.2 Å | ### System Sizes **Single-System Benchmarks:** - MD: 256, 512, 1024, 2048, 4096 atoms - Optimization: 256, 512, 1024, 2048 atoms **Batched Benchmarks:** - System sizes: 256, 512, 1024 atoms per system - Batch sizes: 1, 2, 4, 8, 16, 32 systems ## Running Your Own Benchmarks To reproduce these benchmarks or test on your own hardware: ### Single-System MD ```bash cd benchmarks/dynamics python benchmark_md_single.py --config benchmark_config.yaml ``` ### Batched MD ```bash python benchmark_md_batch.py --config benchmark_config.yaml ``` ### Single-System Optimization ```bash python benchmark_opt_single.py --config benchmark_config.yaml ``` ### Batched Optimization ```bash python benchmark_opt_batch.py --config benchmark_config.yaml ``` ### FIRE1 vs FIRE2 Comparison Full optimization runs comparing FIRE1 and FIRE2 convergence and wall-clock time on fixed-cell and variable-cell LJ systems: ```bash python benchmark_fire_compare.py --config benchmark_config.yaml --output-dir ./benchmark_results ``` ### FIRE2 Kernel Performance Raw per-step GPU kernel timing using CUDA events, sweeping total atoms and batch sizes across float32 and float64: ```bash python benchmark_fire2.py --config benchmark_config.yaml --output-dir ./benchmark_results ``` ### Configuration File Edit `benchmark_config.yaml` to customize benchmarks: ```yaml # MD single-system md_single: enabled: true system_sizes: [256, 512, 1024, 2048, 4096] integrators: velocity_verlet: steps: 10000 dt: 0.001 # fs warmup_steps: 100 langevin: steps: 10000 dt: 0.001 temperature: 300.0 # K friction: 0.01 # 1/fs # MD batched md_batch: enabled: true system_sizes: [256, 512, 1024] batch_sizes: [1, 2, 4, 8, 16, 32] integrators: velocity_verlet: steps: 10000 dt: 0.001 warmup_steps: 100 # Optimization single-system opt_single: enabled: true system_sizes: [256, 512, 1024, 2048] optimizers: fire: max_steps: 1000 force_tolerance: 0.01 # eV/Å # Optimization batched opt_batch: enabled: true system_sizes: [256, 512] batch_sizes: [1, 2, 4, 8, 16] optimizers: fire: max_steps: 1000 force_tolerance: 0.01 # Potential parameters potential: epsilon: 0.0104 # eV sigma: 3.40 # Å cutoff: 8.5 # Å skin: 1.0 # Å neighbor_rebuild_interval: 10 ``` ### Output Results are saved as CSV files in `docs/benchmarks/benchmark_results/`: - `dynamics_md_single_nvalchemiops_.csv` - `dynamics_md_batch_nvalchemiops_.csv` - `dynamics_opt_single_nvalchemiops_.csv` - `dynamics_opt_batch_nvalchemiops_.csv` - `fire_compare_.csv` - `fire2_kernel_benchmark_.csv` Generate plots with: ```bash cd docs/benchmarks python generate_plots.py ```