# Benchmarks

Performance benchmarks for ALCHEMI Toolkit-Ops kernels. Currently, results
are static and cached but we intend to evolve to CI-generated benchmark
results gradually to cover different NVIDIA architectures, benchmark
systems, and so on.

## Available Benchmarks

```{toctree}
:maxdepth: 1

neighborlist
electrostatics
dftd3
```

## About These Benchmarks

Benchmarks are intended to be indicative of `nvalchemiops` performance under
a specific set of criteria; actual performance may differ depending
on a number of factors including but not limited to structure/system
topology, GPU architecture, driver and firmware versions.

## Benchmark Methodology

All benchmarks follow these principles:

- **Tensor allocation excluded**: Only _relevant_ kernel execution time
is measured, i.e. excluding neighbor lists and preprocessing if they
are not part of the benchmark.
- **Warm-up runs**: Multiple warm-up iterations to ensure kernels compile
overhead is removed, and that noise from cache effects are minimized.
- **Statistical sampling**: Multiple timing runs with median time,
maximum memory utilization, and throughput aggregated for reporting.
- **Error handling**: OOM results are included.
- **Consistent inputs**: Same cutoff, lattice type, and parameters across runs