# Neighbor List Benchmarks This page presents benchmark results for various neighbor list algorithms across different GPU hardware. Results are automatically generated from CSV files in the `benchmark_results/` directory. ```{warning} These results are intended to be indicative _only_: your actual performance may vary depending on the atomic system topology, software and hardware configuration and we encourage users to benchmark on their own systems of interest. ``` ## How to Read These Charts Time Scaling : Median execution time (ms) vs. system size. Lower is better. Cell list algorithms show $O(N)$ scaling while naive algorithms show $O(N^2)$. Throughput : Atoms processed per millisecond. Higher is better. This metric helps compare efficiency across different system sizes. Memory : Peak GPU memory usage (MB) vs. system size. Useful for estimating memory requirements for your target system. ## Performance Results Select a method to view detailed benchmark data and scaling plots: ::::{tab-set} :::{tab-item} Naive Brute-force $O(N^2)$ algorithm. Best for very small systems where the overhead of cell list construction exceeds the computational savings. ### Time Scaling ```{figure} _static/neighborlist_scaling_naive_h100-80gb-hbm3.png :width: 80% :align: center :alt: Naive algorithm time scaling Median execution time vs. system size. The $O(N^2)$ scaling becomes apparent for larger systems. ``` ### Throughput ```{figure} _static/neighborlist_throughput_naive_h100-80gb-hbm3.png :width: 80% :align: center :alt: Naive algorithm throughput Throughput (atoms/ms) vs. system size. Throughput decreases as system size grows due to $O(N^2)$ scaling. ``` ### Memory Usage ```{figure} _static/neighborlist_memory_naive_h100-80gb-hbm3.png :width: 80% :align: center :alt: Naive algorithm memory usage Peak GPU memory consumption vs. system size. ``` ::: :::{tab-item} Cell List Spatial hashing $O(N)$ algorithm. Recommended for medium to large systems where computational efficiency is critical. ### Time Scaling ```{figure} _static/neighborlist_scaling_cell-list_h100-80gb-hbm3.png :width: 80% :align: center :alt: Cell list algorithm time scaling Median execution time vs. system size. Shows near-linear $O(N)$ scaling for large systems. ``` ### Throughput ```{figure} _static/neighborlist_throughput_cell-list_h100-80gb-hbm3.png :width: 80% :align: center :alt: Cell list algorithm throughput Throughput (atoms/ms) vs. system size. Maintains high throughput even for very large systems. ``` ### Memory Usage ```{figure} _static/neighborlist_memory_cell-list_h100-80gb-hbm3.png :width: 80% :align: center :alt: Cell list algorithm memory usage Peak GPU memory consumption vs. system size. ``` ::: :::{tab-item} Batch Naive Batched brute-force algorithm for processing multiple small systems simultaneously. Useful for ML workflows with many small molecules. ### Time Scaling ```{figure} _static/neighborlist_scaling_batch-naive_h100-80gb-hbm3.png :width: 80% :align: center :alt: Batch naive algorithm time scaling Median execution time vs. total atoms across all batched systems. ``` ### Throughput ```{figure} _static/neighborlist_throughput_batch-naive_h100-80gb-hbm3.png :width: 80% :align: center :alt: Batch naive algorithm throughput Throughput (atoms/ms) for batched processing. Different lines show different batch sizes. ``` ### Memory Usage ```{figure} _static/neighborlist_memory_batch-naive_h100-80gb-hbm3.png :width: 80% :align: center :alt: Batch naive algorithm memory usage Peak GPU memory consumption for batched systems. ``` ::: :::{tab-item} Batch Cell List Batched spatial hashing algorithm for processing multiple systems simultaneously with O(N) scaling per system. ### Time Scaling ```{figure} _static/neighborlist_scaling_batch-cell-list_h100-80gb-hbm3.png :width: 80% :align: center :alt: Batch cell list algorithm time scaling Median execution time vs. total atoms across all batched systems. ``` ### Throughput ```{figure} _static/neighborlist_throughput_batch-cell-list_h100-80gb-hbm3.png :width: 80% :align: center :alt: Batch cell list algorithm throughput Throughput (atoms/ms) for batched processing. Different lines show different batch sizes. ``` ### Memory Usage ```{figure} _static/neighborlist_memory_batch-cell-list_h100-80gb-hbm3.png :width: 80% :align: center :alt: Batch cell list algorithm memory usage Peak GPU memory consumption for batched systems. ``` ::: :::: ## Hardware Information **GPU**: NVIDIA H100 80GB HBM3 ## Benchmark Configuration | Parameter | Value | |-----------|-------| | Cutoff | 5.0 Å | | System Type | FCC crystal lattice | | Warmup Iterations | 3 | | Timing Iterations | 10 | | Dtype | `float32` | ## Interpreting Results `method` : Algorithm name. `total_atoms` : Total number of atoms in the system. `atoms_per_system` : Atoms per system (relevant for batch methods). `total_neighbors` : Total number of neighbor pairs found. `batch_size` : Number of systems processed simultaneously (1 for non-batch methods). `median_time_ms` : Median execution time in milliseconds (lower is better). `peak_memory_mb` : Peak GPU memory usage in megabytes. ## Running Your Own Benchmarks To generate benchmark results for your hardware: ```bash cd benchmarks/neighborlist python benchmark_neighborlist.py \ --config benchmark_config.yaml \ --output-dir ../../docs/benchmarks/benchmark_results ``` Results will be saved as CSV files and plots will be automatically generated during the next documentation build.