Data Sinks#

During a dynamics simulation you often need to record results: full trajectory frames for post-processing, relaxed structures at convergence, or scalar time series for monitoring. Data sinks are the pluggable storage backends that snapshot hooks write into.

What is a sink?#

A sink is an object that accepts Batch snapshots and stores them. The dynamics module ships three implementations, each targeting a different performance and persistence trade-off:

Sink

Backing store

Typical use

GPUBuffer

GPU device memory

Maximum throughput; short trajectories or inter-stage data passing

HostMemory

Host RAM

Intermediate staging; moderate-length trajectories

ZarrData

Zarr store on disk

Persistent storage; long trajectories, post-processing, checkpointing

How sinks integrate with hooks#

Sinks do not run on their own — they are consumed by snapshot hooks. When a SnapshotHook or ConvergedSnapshotHook fires, it writes the current batch state into its associated sink. The hook controls when data is captured; the sink controls where it goes.

from nvalchemi.dynamics.hooks import SnapshotHook, ConvergedSnapshotHook
from nvalchemi.dynamics.sinks import ZarrData, GPUBuffer

# Record the full trajectory to disk every 50 steps
trajectory_hook = SnapshotHook(
    sink=ZarrData("/path/to/trajectory.zarr"),
    interval=50,
)

# Capture only converged structures in a GPU buffer
converged_hook = ConvergedSnapshotHook(
    sink=GPUBuffer(capacity=256),
)

GPUBuffer#

GPUBuffer stores snapshots in GPU device memory. This avoids device-to-host transfers entirely, making it the fastest option when downstream consumers (e.g. the next stage in a fused pipeline) also live on the GPU.

Because GPU memory is limited, GPUBuffer is best suited for short-lived data: a few hundred frames, or converged structures that will be consumed and discarded before the buffer fills.

HostMemory#

HostMemory moves snapshots to host RAM. This is a middle ground: cheaper than GPU memory but still in-process, so there is no disk I/O overhead. Use it when trajectories are too large for GPU memory but you want to avoid disk writes during the simulation loop, deferring serialization to after the run completes.

ZarrData#

ZarrData writes snapshots to a Zarr store on disk. Zarr’s chunked, compressed format handles large trajectories efficiently and integrates with the toolkit’s data loading pipeline — the same AtomicDataZarrReader used for training data can read trajectory stores.

from nvalchemi.dynamics.sinks import ZarrData

sink = ZarrData("/path/to/trajectory.zarr")

ZarrData is the recommended choice for production workflows where results need to survive the process, be shared across machines, or feed back into training.

Putting it together#

A typical dynamics setup combines multiple hooks and sinks to capture different aspects of the simulation:

from nvalchemi.dynamics import FIRE
from nvalchemi.dynamics.hooks import (
    ConvergenceHook,
    ConvergedSnapshotHook,
    LoggingHook,
    SnapshotHook,
)
from nvalchemi.dynamics.sinks import GPUBuffer, ZarrData

with FIRE(
    model=model,
    dt=0.1,
    n_steps=500,
    hooks=[
        # Stop when converged
        ConvergenceHook(fmax=0.05),
        # Log scalars every 10 steps
        LoggingHook(interval=10),
        # Full trajectory to disk every 50 steps
        SnapshotHook(sink=ZarrData("/tmp/traj.zarr"), interval=50),
        # Converged frames to GPU for downstream consumption
        ConvergedSnapshotHook(sink=GPUBuffer(capacity=256)),
    ],
) as opt:
    relaxed = opt.run(batch)

See also#

  • Hooks: The Hooks guide covers the hook protocol and how to write custom hooks.

  • Data loading: The Data Loading Pipeline guide shows how to read Zarr stores back for training or analysis.

  • API: nvalchemi.dynamics for the full sinks API reference.