nvalchemi.dynamics.SizeAwareSampler#

class nvalchemi.dynamics.SizeAwareSampler(dataset, max_atoms, max_edges, max_batch_size, bin_width=1, shuffle=False, max_gpu_memory_fraction=0.8)[source]#

Size-aware sampler for inflight batching.

Manages dataset access, capacity budgets, and bin-packing logic for efficient GPU utilization during dynamics simulations. Ensures every replacement sample fits within the memory envelope of the graduated sample it replaces.

When CUDA is available, the sampler uses a heuristic to estimate the maximum number of atoms that fit in GPU memory. This estimate is combined with user-specified max_atoms — the more restrictive constraint wins. The GPU memory heuristic is best-effort and conservative; users who need tighter control should set max_atoms explicitly.

Parameters:
  • dataset (Any) – Dataset with __len__, __getitem__, and get_metadata(idx) methods. get_metadata must return (num_atoms, num_edges).

  • max_atoms (int | None) – Maximum total atoms across all samples in a batch. None disables the atom count constraint (GPU memory estimate may still apply).

  • max_edges (int | None) – Maximum total edges across all samples in a batch. None disables the edge count constraint.

  • max_batch_size (int) – Maximum number of samples (graphs) in a batch.

  • bin_width (int) – Atom-count bin width for grouping samples. Default 1.

  • shuffle (bool) – Whether to shuffle within bins. Default False.

  • max_gpu_memory_fraction (float) – Fraction of GPU memory to use when estimating atom capacity. Default 0.8 (80%), leaving 20% headroom for model parameters and CUDA context. Only used when CUDA is available.

Raises:
  • RuntimeError – If any sample in the dataset has num_atoms > max_atoms or num_edges > max_edges — such samples can never be placed into any batch and indicate a configuration error.

  • ValueError – If max_batch_size < 1, bin_width < 1, or max_gpu_memory_fraction is not in (0.0, 1.0].

Examples

>>> sampler = SizeAwareSampler(dataset, max_atoms=100, max_edges=500, max_batch_size=10)
>>> batch = sampler.build_initial_batch()
>>> replacement = sampler.request_replacement(num_atoms=5, num_edges=20)
__init__(dataset, max_atoms, max_edges, max_batch_size, bin_width=1, shuffle=False, max_gpu_memory_fraction=0.8)[source]#

Initialize the size-aware sampler.

Parameters:
  • dataset (Any) – Dataset with __len__, __getitem__, and get_metadata(idx) methods. get_metadata must return (num_atoms, num_edges).

  • max_atoms (int | None) – Maximum total atoms across all samples in a batch. None disables the atom count constraint (GPU memory estimate may still apply).

  • max_edges (int | None) – Maximum total edges across all samples in a batch. None disables the edge count constraint.

  • max_batch_size (int) – Maximum number of samples (graphs) in a batch.

  • bin_width (int) – Atom-count bin width for grouping samples. Default 1.

  • shuffle (bool) – Whether to shuffle within bins. Default False.

  • max_gpu_memory_fraction (float) – Fraction of GPU memory to use when estimating atom capacity. Default 0.8 (80%), leaving 20% headroom for model parameters and CUDA context. Only used when CUDA is available.

Raises:
  • RuntimeError – If any sample exceeds max_atoms or max_edges constraints.

  • ValueError – If max_batch_size < 1, bin_width < 1, or max_gpu_memory_fraction is not in (0.0, 1.0].

  • TypeError – If dataset does not implement required interface.

Return type:

None

Methods

__init__(dataset, max_atoms, max_edges, ...)

Initialize the size-aware sampler.

build_initial_batch()

Build an initial batch using greedy bin packing.

request_replacement(num_atoms, num_edges)

Request a replacement sample that fits within the given constraints.

request_replacements(node_counts, edge_counts)

Request replacement samples for multiple graduated systems using GPU-native constraint checking.

Attributes

exhausted

Check if all samples have been consumed.