(dynamics_sinks_guide)= # Data Sinks During a dynamics simulation you often need to record results: full trajectory frames for post-processing, relaxed structures at convergence, or scalar time series for monitoring. **Data sinks** are the pluggable storage backends that snapshot hooks write into. ## What is a sink? A sink is an object that accepts {py:class}`~nvalchemi.data.Batch` snapshots and stores them. The dynamics module ships three implementations, each targeting a different performance and persistence trade-off: | Sink | Backing store | Typical use | |------|---------------|-------------| | {py:class}`~nvalchemi.dynamics.sinks.GPUBuffer` | GPU device memory | Maximum throughput; short trajectories or inter-stage data passing | | {py:class}`~nvalchemi.dynamics.sinks.HostMemory` | Host RAM | Intermediate staging; moderate-length trajectories | | {py:class}`~nvalchemi.dynamics.sinks.ZarrData` | Zarr store on disk | Persistent storage; long trajectories, post-processing, checkpointing | ## How sinks integrate with hooks Sinks do not run on their own --- they are consumed by snapshot hooks. When a {py:class}`~nvalchemi.dynamics.hooks.SnapshotHook` or {py:class}`~nvalchemi.dynamics.hooks.ConvergedSnapshotHook` fires, it writes the current batch state into its associated sink. The hook controls *when* data is captured; the sink controls *where* it goes. ```python from nvalchemi.dynamics.hooks import SnapshotHook, ConvergedSnapshotHook from nvalchemi.dynamics.sinks import ZarrData, GPUBuffer # Record the full trajectory to disk every 50 steps trajectory_hook = SnapshotHook( sink=ZarrData("/path/to/trajectory.zarr"), interval=50, ) # Capture only converged structures in a GPU buffer converged_hook = ConvergedSnapshotHook( sink=GPUBuffer(capacity=256), ) ``` ## GPUBuffer {py:class}`~nvalchemi.dynamics.sinks.GPUBuffer` stores snapshots in GPU device memory. This avoids device-to-host transfers entirely, making it the fastest option when downstream consumers (e.g. the next stage in a fused pipeline) also live on the GPU. Because GPU memory is limited, `GPUBuffer` is best suited for short-lived data: a few hundred frames, or converged structures that will be consumed and discarded before the buffer fills. ## HostMemory {py:class}`~nvalchemi.dynamics.sinks.HostMemory` moves snapshots to host RAM. This is a middle ground: cheaper than GPU memory but still in-process, so there is no disk I/O overhead. Use it when trajectories are too large for GPU memory but you want to avoid disk writes during the simulation loop, deferring serialization to after the run completes. ## ZarrData {py:class}`~nvalchemi.dynamics.sinks.ZarrData` writes snapshots to a Zarr store on disk. Zarr's chunked, compressed format handles large trajectories efficiently and integrates with the toolkit's data loading pipeline --- the same {py:class}`~nvalchemi.data.datapipes.backends.zarr.AtomicDataZarrReader` used for training data can read trajectory stores. ```python from nvalchemi.dynamics.sinks import ZarrData sink = ZarrData("/path/to/trajectory.zarr") ``` ZarrData is the recommended choice for production workflows where results need to survive the process, be shared across machines, or feed back into training. ## Putting it together A typical dynamics setup combines multiple hooks and sinks to capture different aspects of the simulation: ```python from nvalchemi.dynamics import FIRE from nvalchemi.dynamics.hooks import ( ConvergenceHook, ConvergedSnapshotHook, LoggingHook, SnapshotHook, ) from nvalchemi.dynamics.sinks import GPUBuffer, ZarrData with FIRE( model=model, dt=0.1, n_steps=500, hooks=[ # Stop when converged ConvergenceHook(fmax=0.05), # Log scalars every 10 steps LoggingHook(interval=10), # Full trajectory to disk every 50 steps SnapshotHook(sink=ZarrData("/tmp/traj.zarr"), interval=50), # Converged frames to GPU for downstream consumption ConvergedSnapshotHook(sink=GPUBuffer(capacity=256)), ], ) as opt: relaxed = opt.run(batch) ``` ## See also - **Hooks**: The [Hooks guide](dynamics_hooks_guide) covers the hook protocol and how to write custom hooks. - **Data loading**: The [Data Loading Pipeline](datapipes_guide) guide shows how to read Zarr stores back for training or analysis. - **API**: {py:mod}`nvalchemi.dynamics` for the full sinks API reference.