`nvalchemi.data`.AtomicDataZarrWriter#

class nvalchemi.data.AtomicDataZarrWriter(store, config=None)[source]#

Writer for serializing AtomicData into Zarr stores.

Writes AtomicData objects into a structured Zarr store with CSR-style pointer arrays for variable-size graph data. Supports single writes, batch writes, appending, custom fields, soft-delete, and defragmentation.

The Zarr store layout is:: dataset.zarr/ ├── meta/ # Pointer arrays + masks │ ├── atoms_ptr # int64 [N+1] — cumulative node counts │ ├── edges_ptr # int64 [N+1] — cumulative edge counts │ ├── samples_mask # bool [N] — False = deleted sample │ ├── atoms_mask # bool [V_total] — False = deleted atom │ └── edges_mask # bool [E_total] — False = deleted edge │ ├── core/ # AtomicData fields (auto-populated) │ ├── atomic_numbers # int64 [V_total] │ ├── positions # float32 [V_total, 3] │ └── … │ ├── custom/ # User-defined arrays (optional) │ └── <user_key> # any dtype, any shape │ └── .zattrs # root metadata

Parameters:

store (StoreLike) – Any zarr-compatible store: filesystem path (str or Path), or a zarr Store instance (LocalStore, MemoryStore, FsspecStore, etc.), StorePath, or a dict for in-memory buffer storage.
config (ZarrWriteConfig | Mapping[str, Any] | None) – Compression/chunking configuration. Can be a ZarrWriteConfig instance or a dict that will be converted to one. Default is None (use Zarr defaults).

_store#

The zarr store used for I/O.

Type:: StoreLike

_config#

The write configuration for compression and chunking.

Type:: ZarrWriteConfig

add_custom(key, data, level)[source]#

Add a custom array to the custom/ group.

Parameters:

key (str) – Name for the custom array.
data (torch.Tensor) – Tensor data. First dimension must match: - num_samples for “system” level - total atoms for “atom” level - total edges for “edge” level
level (str) – One of “atom”, “edge”, “system”.

Raises:

ValueError – If level is invalid or data shape doesn’t match.
FileNotFoundError – If store does not exist.

Return type:

None

append(data)[source]#

Append a single AtomicData to an existing Zarr store.

While this dispatch is available for convenience, we recommend users to try and amortize I/O operations by packing multiple data to write, instead of one at a time. This can be achieved by passing either a Batch object, or a list of AtomicData which will automatically form a batch.

Parameters:

data (Batch) – Single atomic data to append.
data – Batched atomic data to append.

Raises:

FileNotFoundError – If store does not exist.
.. py:function: – append(self, data: list[nvalchemi.data.atomic_data.AtomicData]) -> NoneType: :noindex:
Append a list of AtomicData to an existing Zarr store. –
.. py:function: – append(self, data: nvalchemi.data.batch.Batch) -> NoneType: :noindex:
Append a Batch to an existing Zarr store. –
This is the efficient bulk-append path. Since a Batch already has –
all tensors concatenated (node/edge level) or stacked (system level), –
each field is extended in a single I/O operation with no per-sample –
iteration. –
FileNotFoundError – If store does not exist.

Return type:

None

defragment(config=None)[source]#

Rewrite store excluding deleted samples.

Rebuilds all arrays, pointer arrays, and resets all masks to True.

Parameters:: config (ZarrWriteConfig | Mapping[str, Any] | None) – Optional new write configuration for the rebuilt arrays. When provided, also updates the writer’s stored config for future operations. When None, reuses the existing writer config.
Return type:: None

delete(indices)[source]#

Soft-delete samples by index.

Sets masks to False and zeros out data slices in core/ and custom/. Pointer arrays are NOT modified.

Parameters:: indices (list[int] | torch.Tensor) – Sample indices to delete.
Return type:: None

write(data)[source]#

Write a single AtomicData.

write(self, data: list[nvalchemi.data.atomic_data.AtomicData]) → NoneType[source]

Parameters:: data (AtomicData)
Return type:: None

Write a list of AtomicData to a new Zarr store.

write(self, data: nvalchemi.data.batch.Batch) → NoneType[source]

Parameters:: data (AtomicData)
Return type:: None

Write a Batch to a new Zarr store.

This is the efficient bulk-write path. Since a Batch already has all tensors concatenated (node/edge level) or stacked (system level), each field is written to zarr in a single I/O operation with no per-sample iteration.

Parameters:

batch (Batch) – Batched atomic data to write.
data (AtomicData)

Raises:

FileExistsError – If store already exists.
ValueError – If batch is empty.

Return type:

None

nvalchemi.data.AtomicDataZarrWriter#

`nvalchemi.data`.AtomicDataZarrWriter#