.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/intermediate/02_trajectory_zarr_io.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_intermediate_02_trajectory_zarr_io.py: Writing and Replaying Trajectories with Zarr ============================================ `Zarr `_ is a chunked, compressed array format designed for large scientific datasets. nvalchemi uses it as the on-disk representation for trajectories: each dynamics snapshot is serialised into a CSR-style layout (pointer arrays + concatenated fields) so that random-access reads and incremental appends are both efficient. The data flow for a full simulation-to-training pipeline is: :: NVTLangevin + SnapshotHook │ (writes every N steps) ▼ ZarrData (DataSink backed by Zarr store on disk) │ ▼ AtomicDataZarrReader ──► Dataset ──► DataLoader (yields Batch objects) This example demonstrates each step: 1. Build a periodic argon NVT simulation using the Lennard-Jones potential. 2. Attach a :class:`~nvalchemi.dynamics.hooks.SnapshotHook` that writes every 10 steps into a :class:`~nvalchemi.dynamics.ZarrData` sink. 3. Run 100 steps, then read the trajectory back via :class:`~nvalchemi.data.datapipes.AtomicDataZarrReader` and :class:`~nvalchemi.data.datapipes.DataLoader`. 4. Validate round-trip shape correctness. .. GENERATED FROM PYTHON SOURCE LINES 48-63 .. code-block:: Python import logging import tempfile from pathlib import Path import torch from nvalchemi.data import AtomicData, Batch from nvalchemi.data.datapipes import AtomicDataZarrReader, DataLoader, Dataset from nvalchemi.dynamics import NVTLangevin, ZarrData from nvalchemi.dynamics.hooks import NeighborListHook, SnapshotHook, WrapPeriodicHook from nvalchemi.models.lj import LennardJonesModelWrapper logging.basicConfig(level=logging.INFO) .. GENERATED FROM PYTHON SOURCE LINES 64-75 Build an argon NVT simulation ------------------------------ Argon LJ parameters (Rappe & Casewit 1991): epsilon = 0.0104 eV, sigma = 3.40 Å. We use a small 3×3×3 simple-cubic lattice (27 atoms) with a 10.5 Å cubic box so that the nearest-neighbour distance (3.5 Å) is safely inside the 8.5 Å cutoff. The :class:`~nvalchemi.models.lj.LennardJonesModelWrapper` requires a :class:`~nvalchemi.dynamics.hooks.NeighborListHook` to be registered on the dynamics engine so that ``batch.neighbor_matrix`` is populated before each model forward pass. .. GENERATED FROM PYTHON SOURCE LINES 75-123 .. code-block:: Python torch.manual_seed(0) model = LennardJonesModelWrapper(epsilon=0.0104, sigma=3.40, cutoff=8.5) model.eval() # Build a 3x3x3 simple-cubic lattice of argon atoms. SPACING = 3.5 # Å — nearest-neighbour distance N_SIDE = 3 BOX = SPACING * N_SIDE # 10.5 Å coords = [] for ix in range(N_SIDE): for iy in range(N_SIDE): for iz in range(N_SIDE): coords.append([ix * SPACING, iy * SPACING, iz * SPACING]) # noqa: PERF401 n_atoms = len(coords) # 27 positions = torch.tensor(coords, dtype=torch.float32) # Add small random displacements to break perfect symmetry. g = torch.Generator() g.manual_seed(1) positions += torch.randn(n_atoms, 3, generator=g) * 0.05 cell = torch.eye(3, dtype=torch.float32).unsqueeze(0) * BOX # Temperature 50 K: Maxwell-Boltzmann velocities. KB_EV = 8.617333e-5 # eV/K kT = 50.0 * KB_EV mass_ar = 39.948 # atomic mass units (amu); forces in eV/Å, mass in amu g2 = torch.Generator() g2.manual_seed(2) velocities = torch.randn(n_atoms, 3, generator=g2) * (kT / mass_ar) ** 0.5 data = AtomicData( positions=positions, atomic_numbers=torch.full((n_atoms,), 18, dtype=torch.long), # Ar = 18 atomic_masses=torch.full((n_atoms,), mass_ar), forces=torch.zeros(n_atoms, 3), energies=torch.zeros(1, 1), cell=cell, pbc=torch.tensor([[True, True, True]]), ) data.add_node_property("velocities", velocities) batch = Batch.from_data_list([data]) .. GENERATED FROM PYTHON SOURCE LINES 124-130 Setting up the ZarrData sink and SnapshotHook ---------------------------------------------- :class:`~nvalchemi.dynamics.ZarrData` accepts any zarr-compatible store. Here we use a temporary directory on disk so the example is self-contained. :class:`~nvalchemi.dynamics.hooks.SnapshotHook` writes the **full** batch state to the sink every ``frequency`` steps. .. GENERATED FROM PYTHON SOURCE LINES 130-137 .. code-block:: Python zarr_dir = tempfile.mkdtemp(suffix="_argon_traj") zarr_path = Path(zarr_dir) / "trajectory.zarr" zarr_sink = ZarrData(store=str(zarr_path), capacity=10_000) snapshot_hook = SnapshotHook(sink=zarr_sink, frequency=10) .. GENERATED FROM PYTHON SOURCE LINES 138-146 Configuring the NVT integrator -------------------------------- Register: 1. ``NeighborListHook`` — builds ``neighbor_matrix`` before each force eval. 2. ``WrapPeriodicHook`` — folds coordinates back into the primary cell after each position update to prevent atoms drifting outside the box. 3. ``SnapshotHook`` — writes to the Zarr sink every 10 steps. .. GENERATED FROM PYTHON SOURCE LINES 146-160 .. code-block:: Python nl_hook = NeighborListHook(model.model_card.neighbor_config) wrap_hook = WrapPeriodicHook() nvt = NVTLangevin( model=model, dt=1.0, # fs temperature=50.0, friction=0.1, random_seed=42, n_steps=100, hooks=[nl_hook, wrap_hook, snapshot_hook], ) .. GENERATED FROM PYTHON SOURCE LINES 161-165 Running and collecting the trajectory --------------------------------------- After 100 steps with ``frequency=10``, the sink holds 10 snapshots (steps 10, 20, ..., 100). .. GENERATED FROM PYTHON SOURCE LINES 165-172 .. code-block:: Python logging.info("Running 100 NVT steps on %d-atom argon system...", n_atoms) batch = nvt.run(batch) n_snaps = len(zarr_sink) logging.info("Trajectory written: %d snapshots at %s", n_snaps, zarr_path) .. GENERATED FROM PYTHON SOURCE LINES 173-179 Reading back with DataLoader ----------------------------- The read path is: ``AtomicDataZarrReader`` provides random-access sample loading; ``Dataset`` wraps it and returns ``AtomicData`` objects with optional device transfer; ``DataLoader`` collates them into ``Batch`` objects of a given batch size. .. GENERATED FROM PYTHON SOURCE LINES 179-199 .. code-block:: Python reader = AtomicDataZarrReader(str(zarr_path)) ds = Dataset(reader, device="cpu", num_workers=1) loader = DataLoader(ds, batch_size=2) logging.info("Dataset length: %d samples", len(ds)) logging.info("DataLoader yields %d batches of size 2", len(loader)) # Iterate over all batches and collect. loaded_batches: list[Batch] = [] for loaded_batch in loader: loaded_batches.append(loaded_batch) logging.info( " batch: num_graphs=%d positions.shape=%s", loaded_batch.num_graphs, tuple(loaded_batch.positions.shape), ) ds.close() .. GENERATED FROM PYTHON SOURCE LINES 200-205 Round-trip validation ---------------------- Check that the loaded trajectory has the correct number of snapshots and that each snapshot contains the right number of atoms with the expected tensor shapes. .. GENERATED FROM PYTHON SOURCE LINES 205-220 .. code-block:: Python total_loaded = sum(b.num_graphs for b in loaded_batches) assert total_loaded == n_snaps, f"Expected {n_snaps} snapshots, got {total_loaded}" # Inspect the first snapshot from the first loaded batch. first_snap = loaded_batches[0] assert first_snap.positions.shape[-1] == 3, "positions must be (N, 3)" assert first_snap.atomic_numbers is not None, "atomic_numbers must be present" logging.info( "Round-trip OK: %d snapshots, each with %d atoms.", n_snaps, n_atoms, ) .. GENERATED FROM PYTHON SOURCE LINES 221-227 Summary -------- The Zarr store persists at ``zarr_path`` and can be reloaded in future sessions or used for training downstream ML models. For long simulations, ``ZarrData`` is preferred over :class:`~nvalchemi.dynamics.HostMemory` because it streams directly to disk rather than accumulating in RAM. .. GENERATED FROM PYTHON SOURCE LINES 227-229 .. code-block:: Python logging.info("Zarr store location: %s", zarr_path) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.794 seconds) .. _sphx_glr_download_examples_intermediate_02_trajectory_zarr_io.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 02_trajectory_zarr_io.ipynb <02_trajectory_zarr_io.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 02_trajectory_zarr_io.py <02_trajectory_zarr_io.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 02_trajectory_zarr_io.zip <02_trajectory_zarr_io.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_