Neighbor Lists#

Neighbor lists identify atom pairs within a cutoff distance—the foundation for all forms of interatomic interactions including but not limited to: machine-learned interatomic potentials, dispersion corrections, and so on. ALCHEMI Toolkit-Ops provides GPU-accelerated neighbor list algorithms via NVIDIA Warp with full torch.compile support.

Tip

Start with the unified neighbor_list() function. It automatically selects the best algorithm for your system size and handles both single and batched inputs.

Why Neighbor Lists Matter for Performance#

Neighbor list construction can dominate runtime in atomistic foundation models:

Naive algorithms scale as (O(N^2)): Checking all atom pairs becomes prohibitive for systems with a large number of atoms (approx. 5000 atoms, but depends on structure and hardware)
Repeated construction: Training loops and MD simulations rebuild neighbor lists frequently—every step or every few steps
Memory bandwidth: Large neighbor matrices can bottleneck GPU throughput

ALCHEMI Toolkit-Ops addresses these challenges with O(N) cell list algorithms, efficient batch processing for heterogeneous datasets, and memory layouts optimized for GPU access patterns. See performance considerations for guidance.

Quick Start#

The neighbor_list() function provides a unified interface that automatically dispatches to the optimal algorithm based on system size and whether batch indices are provided.

Single + Large

Single system with >5000 atoms

from nvalchemiops.neighborlist import neighbor_list

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, method="cell_list"
)

Dispatches to cell_list() — (O(N)) algorithm using spatial decomposition.

Single + Small

Single system with <5000 atoms

from nvalchemiops.neighborlist import neighbor_list

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, method="naive"
)

Dispatches to naive_neighbor_list() — (O(N^2)) algorithm with lower overhead.

Batch + Large

Multiple systems with >5000 atoms each

from nvalchemiops.neighborlist import neighbor_list

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cells, pbc=pbc,
    batch_idx=batch_idx, method="batch_cell_list"
)

Dispatches to batch_cell_list() — (O(N)) algorithm for heterogeneous batches.

Batch + Small

Multiple systems with <5000 atoms each

from nvalchemiops.neighborlist import neighbor_list

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cells, pbc=pbc,
    batch_idx=batch_idx, method="batch_naive"
)

Dispatches to batch_naive_neighbor_list() — (O(N^2)) algorithm for batched small systems.

Note

When method is not specified, neighbor_list automatically selects based on system size (greater than 5000 atoms) and whether batch_idx is provided. The crossover point depends on system density and cutoff radius—benchmark your workload to find the optimal threshold.

Data Formats#

ALCHEMI Toolkit-Ops supports two output formats for neighbor data:

Neighbor Matrix (default) : Fixed-size tensor of shape (num_atoms, max_neighbors) where each row contains the neighbor indices for that atom, padded with a fill value. Returns (neighbor_matrix, num_neighbors, neighbor_matrix_shifts).

Neighbor List (COO format) : Sparse tensor of shape (2, num_pairs) containing [source_atoms, target_atoms]. Returns (neighbor_list, neighbor_ptr, neighbor_list_shifts) where neighbor_ptr is a CSR-style pointer array. The first set of atoms (nominally source_atoms) is guaranteed to be sorted.

When to Use Each Format#

Neighbor Matrix is preferred when:

Using torch.compile (fixed memory layout avoids graph breaks)
Systems have dense, uniform neighbor distributions
Cache-friendly access patterns are important

Neighbor List (COO) is preferred when:

Integrating with graph neural network libraries (PyG, DGL)
Systems are sparse with highly variable neighbors per atom
Memory efficiency is critical

Switching Formats#

# Get COO format directly
neighbor_list_coo, neighbor_ptr, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, return_neighbor_list=True
)

# Or convert from matrix format
from nvalchemiops.neighborlist.neighbor_utils import get_neighbor_list_from_neighbor_matrix

neighbor_list_coo, neighbor_ptr, shifts_coo = get_neighbor_list_from_neighbor_matrix(
    neighbor_matrix, num_neighbors, neighbor_matrix_shifts, fill_value=num_atoms
)

Warning

Setting return_neighbor_list=True incurs a conversion overhead. If you need both formats, compute the matrix format first and convert as needed.

Method Dispatch#

When method=None, neighbor_list() selects an algorithm using the following logic:

If cutoff2 is provided, then dual cutoff method
If total_atoms >= 5000, then "cell_list"
Otherwise, "naive" (\(N^2\) scaling algorithm)
If batch_idx or batch_ptr is provided, then prepend "batch_" to the method

Available Methods#

Method	Algorithm	Use Case
`"naive"`	(O(N^2)) pairwise	Small single systems (<5000 atoms)
`"cell_list"`	(O(N)) spatial decomposition	Large single systems
`"batch_naive"`	(O(N^2)) per system	Batched small systems
`"batch_cell_list"`	(O(N)) per system	Batched large systems
`"naive_dual_cutoff"`	(O(N^2)) with two cutoffs	Multi-range potentials
`"batch_naive_dual_cutoff"`	Batched dual cutoff	Batched multi-range

Override automatic selection by passing the method parameter:

# Force cell_list on a small system for testing
neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, method="cell_list"
)

Performance Tuning#

Key Parameters#

max_neighbors : Maximum neighbors per atom; determines the width of neighbor_matrix. Auto-estimated if not provided. Pass this value explicitly to neighbor_list calls if you have an accurate value to reduce memory requirements as well as improve kernel performance. The estimate_max_neighbors() method will otherwise provide a very conservative estimate based on atomic density.

atomic_density : Atomic density in atoms per unit volume, used by estimate_max_neighbors(). Default is 0.5. Increase for dense systems to avoid truncated neighbor lists.

safety_factor : Multiplier applied to the neighbor estimate. Default is 5.0. Provides headroom for density fluctuations.

max_nbins : Maximum number of spatial cells for cell list decomposition. Default is 1000. Limits memory usage for very large simulation boxes.

Estimation Utilities#

The estimate_max_neighbors() function estimates the maximum number of neighbors \(n\) any atom could have based on the cutoff sphere volume (\(r\)) and atomic density \(\rho\), with an additional safety factor (\(S\)):

\[ n = 2^{\lceil \log_2(S \times \rho \times \frac{4}{3} \pi r^3) \rceil} \]

The result is rounded up to the next power of 2 for memory alignment.

from nvalchemiops.neighborlist import estimate_max_neighbors, estimate_cell_list_sizes

max_neighbors = estimate_max_neighbors(
    cutoff,
    atomic_density=0.3,
    safety_factor=1.0
)

max_total_cells, neighbor_search_radius = estimate_cell_list_sizes(
    cell, pbc, cutoff, max_nbins=1000
)

Setting atomic_density: This should reflect the expected atomic density of your system in atoms per unit volume (using the same length units as cutoff). If set too low, the neighbor matrix may be too narrow and neighbors will be silently truncated. If set too high, memory is wasted on unused columns.

Setting safety_factor: This multiplier provides headroom for local density fluctuations (e.g., atoms clustering in one region). A value of 5.0 is conservative for most systems, but 1.0 is typically sufficient for “reasonable” structures/systems (e.g. in standard public datasets). Reduce it for memory-constrained scenarios where you are confident in uniform density; increase it for systems with significant density variation.

Tip

Users should check the “convergence” of the neighbor list computation by checking the respective tensor containing the number of neighbors per atom, against the maximum estimated number of neighbors. For optimal performance these two factors should be close: if the actual number of neighbors per atom is low relative to the estimated number, the allocated neighbor matrix will be very sparse and memory inefficient (i.e. most elements will be padding). If the actual number exceeds the estimate, neighborhoods will be truncated and there is no guarantee that the nearest neighbors are included.

Pre-allocation for Repeated Calculations#

Pre-allocating output tensors avoids repeated memory allocation overhead when computing neighbor lists in a loop (e.g., during MD simulation or training). This also enables torch.compile compatibility by ensuring fixed tensor shapes.

import torch
from nvalchemiops.neighborlist import neighbor_list, estimate_max_neighbors

num_atoms = positions.shape[0]
max_neighbors = estimate_max_neighbors(cutoff, atomic_density=1.0)

# Pre-allocate tensors
neighbor_matrix = torch.full(
    (num_atoms, max_neighbors), num_atoms, dtype=torch.int32, device="cuda"
)
neighbor_matrix_shifts = torch.zeros(
    (num_atoms, max_neighbors, 3), dtype=torch.int32, device="cuda"
)
num_neighbors = torch.zeros(num_atoms, dtype=torch.int32, device="cuda")

# Pass pre-allocated tensors
neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc,
    neighbor_matrix=neighbor_matrix,
    neighbor_matrix_shifts=neighbor_matrix_shifts,
    num_neighbors=num_neighbors,
    fill_value=num_atoms
)

For cell list methods, you can also pre-allocate the spatial data structures:

from nvalchemiops.neighborlist import (
    estimate_cell_list_sizes, allocate_cell_list, neighbor_list
)

max_total_cells, neighbor_search_radius = estimate_cell_list_sizes(cell, pbc, cutoff)

(
    cells_per_dimension, neighbor_search_radius,
    atom_periodic_shifts, atom_to_cell_mapping,
    atoms_per_cell_count, cell_atom_start_indices, cell_atom_list
) = allocate_cell_list(num_atoms, max_total_cells, neighbor_search_radius, device)

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc,
    cells_per_dimension=cells_per_dimension,
    neighbor_search_radius=neighbor_search_radius,
    atom_periodic_shifts=atom_periodic_shifts,
    atom_to_cell_mapping=atom_to_cell_mapping,
    atoms_per_cell_count=atoms_per_cell_count,
    cell_atom_start_indices=cell_atom_start_indices,
    cell_atom_list=cell_atom_list
)

Warning

If max_neighbors is too small, neighbors beyond that limit are silently dropped. Monitor num_neighbors.max() against your max_neighbors setting to detect truncation.

Usage Patterns#

Basic Single System#

import torch
from nvalchemiops.neighborlist import neighbor_list

# Create atomic system
positions = torch.rand(1000, 3, device="cuda") * 10.0
cell = torch.eye(3, device="cuda").unsqueeze(0) * 10.0
pbc = torch.tensor([True, True, True], device="cuda")
cutoff = 5.0

# Compute neighbors (automatic method selection)
neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc
)

print(f"Average neighbors: {num_neighbors.float().mean():.1f}")

Batch Processing#

import torch
from nvalchemiops.neighborlist import neighbor_list

# Three systems of different sizes
positions = torch.cat([
    torch.rand(100, 3, device="cuda"),   # System 0
    torch.rand(150, 3, device="cuda"),   # System 1
    torch.rand(80, 3, device="cuda"),    # System 2
])

batch_idx = torch.cat([
    torch.zeros(100, dtype=torch.int32, device="cuda"),
    torch.ones(150, dtype=torch.int32, device="cuda"),
    torch.full((80,), 2, dtype=torch.int32, device="cuda"),
])

cells = torch.stack([
    torch.eye(3, device="cuda") * 10.0,
    torch.eye(3, device="cuda") * 12.0,
    torch.eye(3, device="cuda") * 8.0,
])

pbc = torch.tensor([
    [True, True, True],
    [True, True, False],
    [False, False, False],
], device="cuda")

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff=5.0, cell=cells, pbc=pbc, batch_idx=batch_idx
)

Half-Fill Mode#

Store only half of neighbor pairs to avoid double-counting in symmetric calculations:

# Full: stores both (i,j) and (j,i)
neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, half_fill=False
)

# Half: stores only (i,j) where i < j (or with non-zero periodic shift)
neighbor_matrix_half, num_neighbors_half, shifts_half = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, half_fill=True
)

# half_fill=True produces ~50% of the pairs

Build/Query Separation for MD Workflows#

For molecular dynamics, separate building and querying allows caching the spatial data structure:

from nvalchemiops.neighborlist import (
    build_cell_list, query_cell_list,
    allocate_cell_list, estimate_cell_list_sizes, estimate_max_neighbors
)

# Setup (once)
max_total_cells, neighbor_search_radius = estimate_cell_list_sizes(cell, pbc, cutoff)
cell_list_cache = allocate_cell_list(num_atoms, max_total_cells, neighbor_search_radius, device)

max_neighbors = estimate_max_neighbors(cutoff)
neighbor_matrix = torch.full((num_atoms, max_neighbors), -1, dtype=torch.int32, device=device)
neighbor_shifts = torch.zeros((num_atoms, max_neighbors, 3), dtype=torch.int32, device=device)
num_neighbors = torch.zeros(num_atoms, dtype=torch.int32, device=device)

# MD loop
for step in range(num_steps):
    # Build cell list (expensive, done when atoms change cells)
    build_cell_list(positions, cutoff, cell, pbc, *cell_list_cache)

    # Query neighbors (cheaper)
    neighbor_matrix.fill_(-1)
    neighbor_shifts.zero_()
    num_neighbors.zero_()
    query_cell_list(
        positions, cutoff, cell, pbc, *cell_list_cache,
        neighbor_matrix, neighbor_shifts, num_neighbors
    )

    forces = compute_forces(positions, neighbor_matrix, num_neighbors, ...)
    positions = integrate(positions, forces, dt)

Rebuild Detection with Skin Distance#

Avoid rebuilding neighbor lists every step by using a skin distance:

from nvalchemiops.neighborlist import (
    build_cell_list, query_cell_list, cell_list_needs_rebuild,
    allocate_cell_list, estimate_cell_list_sizes
)

cutoff = 5.0
skin_distance = 1.0
effective_cutoff = cutoff + skin_distance

# Build with effective cutoff (includes skin)
max_total_cells, neighbor_search_radius = estimate_cell_list_sizes(
    cell, pbc, effective_cutoff
)
cell_list_cache = allocate_cell_list(num_atoms, max_total_cells, neighbor_search_radius, device)

(
    cells_per_dimension, neighbor_search_radius,
    atom_periodic_shifts, atom_to_cell_mapping,
    atoms_per_cell_count, cell_atom_start_indices, cell_atom_list
) = cell_list_cache

build_cell_list(positions, effective_cutoff, cell, pbc, *cell_list_cache)

for step in range(num_steps):
    positions = integrate(positions, forces, dt)

    # Check if any atom moved to a different cell
    needs_rebuild = cell_list_needs_rebuild(
        positions, atom_to_cell_mapping, cells_per_dimension, cell, pbc
    )

    if needs_rebuild.item():
        build_cell_list(positions, effective_cutoff, cell, pbc, *cell_list_cache)

    # Query with actual cutoff (not effective)
    query_cell_list(positions, cutoff, cell, pbc, *cell_list_cache, ...)

Dual Cutoff#

Compute two neighbor lists with different cutoffs simultaneously:

from nvalchemiops.neighborlist import neighbor_list

cutoff1, cutoff2 = 3.0, 6.0

(
    neighbor_matrix1, num_neighbors1, shifts1,
    neighbor_matrix2, num_neighbors2, shifts2
) = neighbor_list(
    positions, cutoff1, cutoff2=cutoff2, cell=cell, pbc=pbc
)

# neighbor_matrix1: neighbors within cutoff1
# neighbor_matrix2: neighbors within cutoff2 (superset of cutoff1)

This concludes the high-level documentation for neighbor lists: you should now be able to integrate nvalchemiops routines for your neighbor list requirements, and consult the API reference for further details.