nvalchemi.data.DataLoader#

class nvalchemi.data.DataLoader(dataset, *, batch_size=1, shuffle=False, drop_last=False, sampler=None, prefetch_factor=2, num_streams=4, use_streams=True)[source]#

Batch-iterating data loader that yields Batch.

Wraps a Dataset and yields Batch objects built via Batch.from_data_list(). CUDA-stream prefetching is supported for overlapping I/O with computation.

Parameters:
  • dataset (Dataset) – AtomicData-native dataset to load from.

  • batch_size (int, default=1) – Number of samples per batch.

  • shuffle (bool, default=False) – Randomize sample order each epoch.

  • drop_last (bool, default=False) – Drop the last incomplete batch.

  • sampler (torch.utils.data.Sampler | None, default=None) – Custom sampler (overrides shuffle).

  • prefetch_factor (int, default=2) – How many batches to prefetch ahead.

  • num_streams (int, default=4) – Number of CUDA streams for prefetching.

  • use_streams (bool, default=True) – Enable CUDA-stream prefetching.

Examples

>>> from nvalchemi.data.datapipes import AtomicDataZarrReader, Dataset, DataLoader
>>> reader = AtomicDataZarrReader("dataset.zarr")
>>> ds = Dataset(reader, device="cpu")
>>> loader = DataLoader(ds, batch_size=4)
>>> for batch in loader:
...     print(batch.positions.shape)
set_epoch(epoch)[source]#

Set the epoch for the sampler (used in distributed training).

Parameters:

epoch (int) – Current epoch number.

Return type:

None