Neighbor List Benchmarks#

This page presents benchmark results for various neighbor list algorithms across different GPU hardware. Results are automatically generated from CSV files in the benchmark_results/ directory.

Warning

These results are intended to be indicative only: your actual performance may vary depending on the atomic system topology, software and hardware configuration and we encourage users to benchmark on their own systems of interest.

How to Read These Charts#

Time Scaling : Median execution time (ms) vs. system size. Lower is better. Cell list algorithms show \(O(N)\) scaling while naive algorithms show \(O(N^2)\).

Throughput : Atoms processed per millisecond. Higher is better. This metric helps compare efficiency across different system sizes.

Memory : Peak GPU memory usage (MB) vs. system size. Useful for estimating memory requirements for your target system.

Performance Results#

Select a method to view detailed benchmark data and scaling plots:

Brute-force \(O(N^2)\) algorithm. Best for very small systems where the overhead of cell list construction exceeds the computational savings.

Time Scaling

Naive algorithm time scaling

Median execution time vs. system size. The \(O(N^2)\) scaling becomes apparent for larger systems.#

Throughput

Naive algorithm throughput

Throughput (atoms/ms) vs. system size. Throughput decreases as system size grows due to \(O(N^2)\) scaling.#

Memory Usage

Naive algorithm memory usage

Peak GPU memory consumption vs. system size.#

Spatial hashing \(O(N)\) algorithm. Recommended for medium to large systems where computational efficiency is critical.

Time Scaling

Cell list algorithm time scaling

Median execution time vs. system size. Shows near-linear \(O(N)\) scaling for large systems.#

Throughput

Cell list algorithm throughput

Throughput (atoms/ms) vs. system size. Maintains high throughput even for very large systems.#

Memory Usage

Cell list algorithm memory usage

Peak GPU memory consumption vs. system size.#

Batched brute-force algorithm for processing multiple small systems simultaneously. Useful for ML workflows with many small molecules.

Time Scaling

Batch naive algorithm time scaling

Median execution time vs. total atoms across all batched systems.#

Throughput

Batch naive algorithm throughput

Throughput (atoms/ms) for batched processing. Different lines show different batch sizes.#

Memory Usage

Batch naive algorithm memory usage

Peak GPU memory consumption for batched systems.#

Batched spatial hashing algorithm for processing multiple systems simultaneously with O(N) scaling per system.

Time Scaling

Batch cell list algorithm time scaling

Median execution time vs. total atoms across all batched systems.#

Throughput

Batch cell list algorithm throughput

Throughput (atoms/ms) for batched processing. Different lines show different batch sizes.#

Memory Usage

Batch cell list algorithm memory usage

Peak GPU memory consumption for batched systems.#

Hardware Information#

GPU: NVIDIA H100 80GB HBM3

Benchmark Configuration#

Parameter

Value

Cutoff

5.0 Å

System Type

FCC crystal lattice

Warmup Iterations

3

Timing Iterations

10

Dtype

float32

Interpreting Results#

method : Algorithm name.

total_atoms : Total number of atoms in the system.

atoms_per_system : Atoms per system (relevant for batch methods).

total_neighbors : Total number of neighbor pairs found.

batch_size : Number of systems processed simultaneously (1 for non-batch methods).

median_time_ms : Median execution time in milliseconds (lower is better).

peak_memory_mb : Peak GPU memory usage in megabytes.

Running Your Own Benchmarks#

To generate benchmark results for your hardware:

cd benchmarks/neighborlist
python benchmark_neighborlist.py \
    --config benchmark_config.yaml \
    --output-dir ../../docs/benchmarks/benchmark_results

Results will be saved as CSV files and plots will be automatically generated during the next documentation build.