Benchmarking#
PhysicsNeMo Curator uses a three-tier benchmarking strategy:
Tool |
Purpose |
Scope |
|---|---|---|
pytest-benchmark |
Fast per-PR regression checks in CI |
Current commit only |
ASV (airspeed velocity) |
Long-term historical performance tracking |
Across git history |
Criterion |
Rust micro-benchmarks |
Rust core library |
Quick Start#
# Install dev dependencies (includes asv + pytest-benchmark)
make install
# Build the native extension
make develop
# Run pytest-benchmark (fast, current code)
make bench
# Run ASV on the current commit
make asv-run
# Preview the ASV dashboard
make asv-preview
pytest-benchmark (CI Benchmarks)#
pytest-benchmark runs inside the normal test suite and is designed for fast
per-PR checks. Benchmark tests live in test/ and use the @pytest.mark.benchmark
marker.
import pytest
@pytest.mark.benchmark
def test_pipeline_throughput(benchmark):
"""Benchmark pipeline item processing."""
from curator.core.base import Pipeline
# ... setup ...
benchmark(pipeline.__getitem__, 0)
Run benchmarks:
# Run only benchmarks
uv run pytest test/ --benchmark-only
# Skip benchmarks during normal test runs
uv run pytest test/ --benchmark-skip
# Compare against saved baseline
uv run pytest test/ --benchmark-only --benchmark-compare
Results are stored as JSON in .benchmarks/.
ASV (Historical Benchmarks)#
Airspeed Velocity tracks performance across the project’s git history. It checks out each commit, builds the package in an isolated environment, runs benchmarks, and produces an interactive web dashboard.
Benchmark Files#
ASV benchmarks live in the benchmarks/ directory at the project root:
benchmarks/
├── __init__.py
├── _helpers.py # Shared benchmark utilities
├── asv_build.py # ASV build configuration
├── bench_atm.py # Atomic data benchmarks
├── bench_backends.py # Execution backend benchmarks
├── bench_da.py # DataArray benchmarks
└── bench_mesh.py # Mesh pipeline benchmarks
Writing ASV Benchmarks#
Benchmarks are plain Python classes/functions with magic name prefixes:
Prefix |
Measures |
|---|---|
|
Wall-clock execution time |
|
Memory footprint of returned object |
|
Peak resident memory |
|
Arbitrary numeric value |
|
Execution time in a fresh subprocess |
class TimePipelineIteration:
"""Benchmark per-item pipeline throughput."""
params = [10, 100, 1000]
param_names = ["num_items"]
def setup(self, num_items):
"""Called before each benchmark (excluded from timing)."""
self.pipeline = build_pipeline(num_items)
def time_iterate_all(self, num_items):
"""Time iterating through every item."""
for i in range(len(self.pipeline)):
self.pipeline[i]
Running ASV#
# Benchmark the current commit
make asv-run
# Dry-run (quick smoke test, no results saved)
make asv-quick
# Benchmark a range of commits
uv run asv run v0.1.0..HEAD
# Compare two revisions
make asv-compare REF1=main REF2=HEAD
# Find the commit that introduced a regression
uv run asv find v0.1.0..HEAD TimePipelineIteration.time_iterate_all
# Show results for a commit
uv run asv show HEAD
Live Dashboard#
The ASV benchmark dashboard is published automatically to GitHub Pages by the nightly CI workflow. View it at:
The dashboard updates each night with the latest results. You can also trigger
a run manually from the Actions → Benchmark tab using workflow_dispatch.
To preview locally after running benchmarks:
make asv-publish # Build static HTML from .asv/results
make asv-preview # Serve at http://localhost:8080
Configuration#
ASV is configured in asv.conf.json at the project root. Key settings:
build_command: Usesmaturin develop --releaseto build the Rust extension in each isolated benchmark environmentbenchmark_dir: Points tobenchmarks/environment_type:virtualenv(ASV manages its own venvs)All ASV artifacts (envs, results, HTML) are stored under
.asv/and gitignored
Criterion (Rust Benchmarks)#
Rust micro-benchmarks use Criterion.rs
and live in src/rust/benches/:
# Run Rust benchmarks
cargo bench --manifest-path src/rust/Cargo.toml
Criterion produces HTML reports in src/rust/target/criterion/.
Make Targets Reference#
Target |
Description |
|---|---|
|
pytest-benchmark + Criterion (fast, current code) |
|
ASV benchmark on HEAD (saves results) |
|
ASV dry-run (no results saved) |
|
Build ASV HTML dashboard |
|
Serve ASV dashboard locally |
|
Compare two git revisions |