nvalchemi.data.AtomicDataZarrWriter#
- class nvalchemi.data.AtomicDataZarrWriter(store)[source]#
Writer for serializing AtomicData into Zarr stores.
Writes AtomicData objects into a structured Zarr store with CSR-style pointer arrays for variable-size graph data. Supports single writes, batch writes, appending, custom fields, soft-delete, and defragmentation.
- The Zarr store layout is:
dataset.zarr/ ├── meta/ # Pointer arrays + masks │ ├── atoms_ptr # int64 [N+1] — cumulative node counts │ ├── edges_ptr # int64 [N+1] — cumulative edge counts │ ├── samples_mask # bool [N] — False = deleted sample │ ├── atoms_mask # bool [V_total] — False = deleted atom │ └── edges_mask # bool [E_total] — False = deleted edge │ ├── core/ # AtomicData fields (auto-populated) │ ├── atomic_numbers # int64 [V_total] │ ├── positions # float32 [V_total, 3] │ └── … │ ├── custom/ # User-defined arrays (optional) │ └── <user_key> # any dtype, any shape │ └── .zattrs # root metadata
- Parameters:
store (StoreLike) – Any zarr-compatible store: filesystem path (str or Path), or a zarr Store instance (LocalStore, MemoryStore, FsspecStore, etc.), StorePath, or a dict for in-memory buffer storage.
- _store#
The zarr store used for I/O.
- Type:
StoreLike
- add_custom(key, data, level)[source]#
Add a custom array to the custom/ group.
- Parameters:
key (str) – Name for the custom array.
data (torch.Tensor) – Tensor data. First dimension must match: - num_samples for “system” level - total atoms for “atom” level - total edges for “edge” level
level (str) – One of “atom”, “edge”, “system”.
- Raises:
ValueError – If level is invalid or data shape doesn’t match.
FileNotFoundError – If store does not exist.
- Return type:
None
- append(data)[source]#
Append a single AtomicData to an existing Zarr store.
While this dispatch is available for convenience, we recommend users to try and amortize I/O operations by packing multiple data to write, instead of one at a time. This can be achieved by passing either a
Batchobject, or a list ofAtomicDatawhich will automatically form a batch.- Parameters:
data (Batch) – Single atomic data to append.
data – Batched atomic data to append.
- Raises:
FileNotFoundError – If store does not exist.
.. py:function: – append(self, data: list[nvalchemi.data.atomic_data.AtomicData]) -> NoneType: :noindex:
Append a list of AtomicData to an existing Zarr store. –
.. py:function: – append(self, data: nvalchemi.data.batch.Batch) -> NoneType: :noindex:
Append a Batch to an existing Zarr store. –
This is the efficient bulk-append path. Since a Batch already has –
all tensors concatenated (node/edge level) or stacked (system level), –
each field is extended in a single I/O operation with no per-sample –
iteration. –
FileNotFoundError – If store does not exist.
- Return type:
None
- defragment()[source]#
Rewrite store excluding deleted samples.
Rebuilds all arrays, pointer arrays, and resets all masks to True.
- Return type:
None
- delete(indices)[source]#
Soft-delete samples by index.
Sets masks to False and zeros out data slices in core/ and custom/. Pointer arrays are NOT modified.
- Parameters:
indices (list[int] | torch.Tensor) – Sample indices to delete.
- Return type:
None
- write(data)[source]#
Write a single AtomicData.
- write(self, data: list[nvalchemi.data.atomic_data.AtomicData]) NoneType[source]
- Parameters:
data (AtomicData)
- Return type:
None
Write a list of AtomicData to a new Zarr store.
- write(self, data: nvalchemi.data.batch.Batch) NoneType[source]
- Parameters:
data (AtomicData)
- Return type:
None
Write a Batch to a new Zarr store.
This is the efficient bulk-write path. Since a Batch already has all tensors concatenated (node/edge level) or stacked (system level), each field is written to zarr in a single I/O operation with no per-sample iteration.
- Parameters:
batch (Batch) – Batched atomic data to write.
data (AtomicData)
- Raises:
FileExistsError – If store already exists.
ValueError – If batch is empty.
- Return type:
None