<!-- markdownlint-disable MD013 -->

<!-- markdownlint-disable MD013 -->

(neighborlist_userguide)=

# Neighbor Lists

Neighbor lists identify atom pairs within a cutoff distance---the foundation for
all forms of interatomic interactions including but not limited to: machine-learned
interatomic potentials, dispersion corrections, and so on. ALCHEMI Toolkit-Ops provides
GPU-accelerated neighbor list algorithms via [NVIDIA Warp](https://nvidia.github.io/warp/)
with full `torch.compile` support.

```{tip}
Start with the unified {func}`~nvalchemiops.neighborlist.neighbor_list` function.
It automatically selects the best algorithm for your system size and handles
both single and batched inputs.
```

## Why Neighbor Lists Matter for Performance

Neighbor list construction can dominate runtime in atomistic foundation models:

- **Naive algorithms scale as \(O(N^2)\)**: Checking all atom pairs becomes
  prohibitive for systems with a large number of atoms (approx. 5000 atoms,
  but depends on structure and hardware)
- **Repeated construction**: Training loops and MD simulations rebuild neighbor
  lists frequently---every step or every few steps
- **Memory bandwidth**: Large neighbor matrices can bottleneck GPU throughput

ALCHEMI Toolkit-Ops addresses these challenges with O(N) cell list algorithms,
efficient batch processing for heterogeneous datasets, and memory layouts
optimized for GPU access patterns. See [performance considerations](nl_performance)
for guidance.

## Quick Start

The {func}`~nvalchemiops.neighborlist.neighbor_list` function provides a unified
interface that automatically dispatches to the optimal algorithm based on system
size and whether batch indices are provided.

::::{tab-set}

:::{tab-item} Single + Large
:sync: single-large

Single system with >5000 atoms

```python
from nvalchemiops.neighborlist import neighbor_list

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, method="cell_list"
)
```

Dispatches to {func}`~nvalchemiops.neighborlist.cell_list` --- \(O(N)\) algorithm
using spatial decomposition.
:::

:::{tab-item} Single + Small
:sync: single-small

Single system with <5000 atoms

```python
from nvalchemiops.neighborlist import neighbor_list

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, method="naive"
)
```

Dispatches to {func}`~nvalchemiops.neighborlist.naive_neighbor_list` --- \(O(N^2)\)
algorithm with lower overhead.
:::

:::{tab-item} Batch + Large
:sync: batch-large

Multiple systems with >5000 atoms each

```python
from nvalchemiops.neighborlist import neighbor_list

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cells, pbc=pbc,
    batch_idx=batch_idx, method="batch_cell_list"
)
```

Dispatches to {func}`~nvalchemiops.neighborlist.batch_cell_list` --- \(O(N)\)
algorithm for heterogeneous batches.
:::

:::{tab-item} Batch + Small
:sync: batch-small

Multiple systems with <5000 atoms each

```python
from nvalchemiops.neighborlist import neighbor_list

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cells, pbc=pbc,
    batch_idx=batch_idx, method="batch_naive"
)
```

Dispatches to {func}`~nvalchemiops.neighborlist.batch_naive_neighbor_list` ---
\(O(N^2)\) algorithm for batched small systems.
:::

::::

```{note}
When `method` is not specified, `neighbor_list` automatically selects based on
system size (greater than 5000 atoms) and whether `batch_idx` is provided.
The crossover point depends on system density and cutoff radius---benchmark
your workload to find the optimal threshold.
```

## Data Formats

ALCHEMI Toolkit-Ops supports two output formats for neighbor data:

Neighbor Matrix (default)
: Fixed-size tensor of shape `(num_atoms, max_neighbors)` where each row
  contains the neighbor indices for that atom, padded with a fill value.
  Returns `(neighbor_matrix, num_neighbors, neighbor_matrix_shifts)`.

Neighbor List (COO format)
: Sparse tensor of shape `(2, num_pairs)` containing `[source_atoms, target_atoms]`.
  Returns `(neighbor_list, neighbor_ptr, neighbor_list_shifts)` where
  `neighbor_ptr` is a CSR-style pointer array. The first set of atoms (nominally
  `source_atoms`) is guaranteed to be sorted.

### When to Use Each Format

**Neighbor Matrix** is preferred when:

- Using `torch.compile` (fixed memory layout avoids graph breaks)
- Systems have dense, uniform neighbor distributions
- Cache-friendly access patterns are important

**Neighbor List (COO)** is preferred when:

- Integrating with graph neural network libraries (PyG, DGL)
- Systems are sparse with highly variable neighbors per atom
- Memory efficiency is critical

### Switching Formats

```python
# Get COO format directly
neighbor_list_coo, neighbor_ptr, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, return_neighbor_list=True
)

# Or convert from matrix format
from nvalchemiops.neighborlist.neighbor_utils import get_neighbor_list_from_neighbor_matrix

neighbor_list_coo, neighbor_ptr, shifts_coo = get_neighbor_list_from_neighbor_matrix(
    neighbor_matrix, num_neighbors, neighbor_matrix_shifts, fill_value=num_atoms
)
```

```{warning}
Setting `return_neighbor_list=True` incurs a conversion overhead. If you need
both formats, compute the matrix format first and convert as needed.
```

## Method Dispatch

When `method=None`, {func}`~nvalchemiops.neighborlist.neighbor_list` selects
an algorithm using the following logic:

1. If `cutoff2` is provided, then dual cutoff method
2. If `total_atoms >= 5000`, then `"cell_list"`
3. Otherwise, `"naive"` ($N^2$ scaling algorithm)
4. If `batch_idx` or `batch_ptr` is provided, then prepend `"batch_"` to the method

### Available Methods

| Method | Algorithm | Use Case |
|--------|-----------|----------|
| `"naive"` | \(O(N^2)\) pairwise | Small single systems (<5000 atoms) |
| `"cell_list"` | \(O(N)\) spatial decomposition | Large single systems |
| `"batch_naive"` | \(O(N^2)\) per system | Batched small systems |
| `"batch_cell_list"` | \(O(N)\) per system | Batched large systems |
| `"naive_dual_cutoff"` | \(O(N^2)\) with two cutoffs | Multi-range potentials |
| `"batch_naive_dual_cutoff"` | Batched dual cutoff | Batched multi-range |

Override automatic selection by passing the `method` parameter:

```python
# Force cell_list on a small system for testing
neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, method="cell_list"
)
```

## Performance Tuning

(nl_performance)=

### Key Parameters

`max_neighbors`
: Maximum neighbors per atom; determines the width of `neighbor_matrix`.
  Auto-estimated if not provided. Pass this value explicitly to `neighbor_list`
  calls if you have an accurate value to reduce memory requirements as well
  as improve kernel performance. The `estimate_max_neighbors()` method will
  otherwise provide a **very** conservative estimate based on atomic
  density.

`atomic_density`
: Atomic density in atoms per unit volume, used by `estimate_max_neighbors()`.
  Default is 0.5. Increase for dense systems to avoid truncated neighbor lists.

`safety_factor`
: Multiplier applied to the neighbor estimate. Default is 5.0. Provides
  headroom for density fluctuations.

`max_nbins`
: Maximum number of spatial cells for cell list decomposition. Default is 1000.
  Limits memory usage for very large simulation boxes.

### Estimation Utilities

The {func}`~nvalchemiops.neighborlist.estimate_max_neighbors` function estimates
the maximum number of neighbors $n$ any atom could have based on the cutoff sphere
volume ($r$) and atomic density $\rho$, with an additional safety factor ($S$):

$$
n = 2^{\lceil \log_2(S \times \rho \times \frac{4}{3} \pi r^3) \rceil}
$$

The result is rounded up to the next power of 2 for memory alignment.

```python
from nvalchemiops.neighborlist import estimate_max_neighbors, estimate_cell_list_sizes

max_neighbors = estimate_max_neighbors(
    cutoff,
    atomic_density=0.3,
    safety_factor=1.0
)

max_total_cells, neighbor_search_radius = estimate_cell_list_sizes(
    cell, pbc, cutoff, max_nbins=1000
)
```

**Setting `atomic_density`**: This should reflect the expected atomic density of
your system in atoms per unit volume (using the same length units as `cutoff`).
If set too low, the neighbor matrix may be too narrow and neighbors will be
silently truncated. If set too high, memory is wasted on unused columns.

**Setting `safety_factor`**: This multiplier provides headroom for local density
fluctuations (e.g., atoms clustering in one region). A value of 5.0 is
conservative for most systems, but 1.0 is typically sufficient for "reasonable"
structures/systems (e.g. in standard public datasets). Reduce it for
memory-constrained scenarios where you are confident in uniform density;
increase it for systems with significant density variation.

```{tip}
Users should check the "convergence" of the neighbor list computation by checking
the respective tensor containing the number of neighbors per atom, against
the maximum estimated number of neighbors. For optimal performance these
two factors should be close: if the actual number of neighbors per atom is
low relative to the estimated number, the allocated neighbor matrix will
be very sparse and memory inefficient (i.e. most elements will be padding).
If the actual number exceeds the estimate, neighborhoods will be truncated
and there is no guarantee that the nearest neighbors are included.
```

### Pre-allocation for Repeated Calculations

Pre-allocating output tensors avoids repeated memory allocation overhead when
computing neighbor lists in a loop (e.g., during MD simulation or training).
This also enables `torch.compile` compatibility by ensuring fixed tensor shapes.

```python
import torch
from nvalchemiops.neighborlist import neighbor_list, estimate_max_neighbors

num_atoms = positions.shape[0]
max_neighbors = estimate_max_neighbors(cutoff, atomic_density=1.0)

# Pre-allocate tensors
neighbor_matrix = torch.full(
    (num_atoms, max_neighbors), num_atoms, dtype=torch.int32, device="cuda"
)
neighbor_matrix_shifts = torch.zeros(
    (num_atoms, max_neighbors, 3), dtype=torch.int32, device="cuda"
)
num_neighbors = torch.zeros(num_atoms, dtype=torch.int32, device="cuda")

# Pass pre-allocated tensors
neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc,
    neighbor_matrix=neighbor_matrix,
    neighbor_matrix_shifts=neighbor_matrix_shifts,
    num_neighbors=num_neighbors,
    fill_value=num_atoms
)
```

For cell list methods, you can also pre-allocate the spatial data structures:

```python
from nvalchemiops.neighborlist import (
    estimate_cell_list_sizes, allocate_cell_list, neighbor_list
)

max_total_cells, neighbor_search_radius = estimate_cell_list_sizes(cell, pbc, cutoff)

(
    cells_per_dimension, neighbor_search_radius,
    atom_periodic_shifts, atom_to_cell_mapping,
    atoms_per_cell_count, cell_atom_start_indices, cell_atom_list
) = allocate_cell_list(num_atoms, max_total_cells, neighbor_search_radius, device)

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc,
    cells_per_dimension=cells_per_dimension,
    neighbor_search_radius=neighbor_search_radius,
    atom_periodic_shifts=atom_periodic_shifts,
    atom_to_cell_mapping=atom_to_cell_mapping,
    atoms_per_cell_count=atoms_per_cell_count,
    cell_atom_start_indices=cell_atom_start_indices,
    cell_atom_list=cell_atom_list
)
```

```{warning}
If `max_neighbors` is too small, neighbors beyond that limit are silently
dropped. Monitor `num_neighbors.max()` against your `max_neighbors` setting
to detect truncation.
```

## Usage Patterns

### Basic Single System

```python
import torch
from nvalchemiops.neighborlist import neighbor_list

# Create atomic system
positions = torch.rand(1000, 3, device="cuda") * 10.0
cell = torch.eye(3, device="cuda").unsqueeze(0) * 10.0
pbc = torch.tensor([True, True, True], device="cuda")
cutoff = 5.0

# Compute neighbors (automatic method selection)
neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc
)

print(f"Average neighbors: {num_neighbors.float().mean():.1f}")
```

### Batch Processing

```python
import torch
from nvalchemiops.neighborlist import neighbor_list

# Three systems of different sizes
positions = torch.cat([
    torch.rand(100, 3, device="cuda"),   # System 0
    torch.rand(150, 3, device="cuda"),   # System 1
    torch.rand(80, 3, device="cuda"),    # System 2
])

batch_idx = torch.cat([
    torch.zeros(100, dtype=torch.int32, device="cuda"),
    torch.ones(150, dtype=torch.int32, device="cuda"),
    torch.full((80,), 2, dtype=torch.int32, device="cuda"),
])

cells = torch.stack([
    torch.eye(3, device="cuda") * 10.0,
    torch.eye(3, device="cuda") * 12.0,
    torch.eye(3, device="cuda") * 8.0,
])

pbc = torch.tensor([
    [True, True, True],
    [True, True, False],
    [False, False, False],
], device="cuda")

neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff=5.0, cell=cells, pbc=pbc, batch_idx=batch_idx
)
```

### Half-Fill Mode

Store only half of neighbor pairs to avoid double-counting in symmetric
calculations:

```python
# Full: stores both (i,j) and (j,i)
neighbor_matrix, num_neighbors, shifts = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, half_fill=False
)

# Half: stores only (i,j) where i < j (or with non-zero periodic shift)
neighbor_matrix_half, num_neighbors_half, shifts_half = neighbor_list(
    positions, cutoff, cell=cell, pbc=pbc, half_fill=True
)

# half_fill=True produces ~50% of the pairs
```

### Build/Query Separation for MD Workflows

For molecular dynamics, separate building and querying allows caching the
spatial data structure:

```python
from nvalchemiops.neighborlist import (
    build_cell_list, query_cell_list,
    allocate_cell_list, estimate_cell_list_sizes, estimate_max_neighbors
)

# Setup (once)
max_total_cells, neighbor_search_radius = estimate_cell_list_sizes(cell, pbc, cutoff)
cell_list_cache = allocate_cell_list(num_atoms, max_total_cells, neighbor_search_radius, device)

max_neighbors = estimate_max_neighbors(cutoff)
neighbor_matrix = torch.full((num_atoms, max_neighbors), -1, dtype=torch.int32, device=device)
neighbor_shifts = torch.zeros((num_atoms, max_neighbors, 3), dtype=torch.int32, device=device)
num_neighbors = torch.zeros(num_atoms, dtype=torch.int32, device=device)

# MD loop
for step in range(num_steps):
    # Build cell list (expensive, done when atoms change cells)
    build_cell_list(positions, cutoff, cell, pbc, *cell_list_cache)

    # Query neighbors (cheaper)
    neighbor_matrix.fill_(-1)
    neighbor_shifts.zero_()
    num_neighbors.zero_()
    query_cell_list(
        positions, cutoff, cell, pbc, *cell_list_cache,
        neighbor_matrix, neighbor_shifts, num_neighbors
    )

    forces = compute_forces(positions, neighbor_matrix, num_neighbors, ...)
    positions = integrate(positions, forces, dt)
```

### Rebuild Detection with Skin Distance

Avoid rebuilding neighbor lists every step by using a skin distance:

```python
from nvalchemiops.neighborlist import (
    build_cell_list, query_cell_list, cell_list_needs_rebuild,
    allocate_cell_list, estimate_cell_list_sizes
)

cutoff = 5.0
skin_distance = 1.0
effective_cutoff = cutoff + skin_distance

# Build with effective cutoff (includes skin)
max_total_cells, neighbor_search_radius = estimate_cell_list_sizes(
    cell, pbc, effective_cutoff
)
cell_list_cache = allocate_cell_list(num_atoms, max_total_cells, neighbor_search_radius, device)

(
    cells_per_dimension, neighbor_search_radius,
    atom_periodic_shifts, atom_to_cell_mapping,
    atoms_per_cell_count, cell_atom_start_indices, cell_atom_list
) = cell_list_cache

build_cell_list(positions, effective_cutoff, cell, pbc, *cell_list_cache)

for step in range(num_steps):
    positions = integrate(positions, forces, dt)

    # Check if any atom moved to a different cell
    needs_rebuild = cell_list_needs_rebuild(
        positions, atom_to_cell_mapping, cells_per_dimension, cell, pbc
    )

    if needs_rebuild.item():
        build_cell_list(positions, effective_cutoff, cell, pbc, *cell_list_cache)

    # Query with actual cutoff (not effective)
    query_cell_list(positions, cutoff, cell, pbc, *cell_list_cache, ...)
```

### Dual Cutoff

Compute two neighbor lists with different cutoffs simultaneously:

```python
from nvalchemiops.neighborlist import neighbor_list

cutoff1, cutoff2 = 3.0, 6.0

(
    neighbor_matrix1, num_neighbors1, shifts1,
    neighbor_matrix2, num_neighbors2, shifts2
) = neighbor_list(
    positions, cutoff1, cutoff2=cutoff2, cell=cell, pbc=pbc
)

# neighbor_matrix1: neighbors within cutoff1
# neighbor_matrix2: neighbors within cutoff2 (superset of cutoff1)
```

---

This concludes the high-level documentation for neighbor lists: you should now
be able to integrate `nvalchemiops` routines for your neighbor list requirements,
and consult the [API reference](../../modules/neighborlist) for further details.