nvalchemi.dynamics.SizeAwareSampler#

class nvalchemi.dynamics.SizeAwareSampler(dataset, max_atoms=None, max_edges=None, max_batch_size=None, bin_width=1, shuffle=False, max_gpu_memory_fraction=0.8)[source]#

Size-aware sampler for inflight batching.

Manages dataset access, capacity budgets, and bin-packing logic for efficient GPU utilization during dynamics simulations. Ensures every replacement sample fits within the memory envelope of the graduated sample it replaces.

When CUDA is available, the sampler uses a heuristic to estimate the maximum number of atoms that fit in GPU memory. This estimate is combined with user-specified max_atoms — the more restrictive constraint wins. The GPU memory heuristic is best-effort and conservative; users who need tighter control should set max_atoms explicitly.

Parameters:
  • dataset (Any) – Dataset with __len__, __getitem__, and get_metadata(idx) methods. get_metadata must return (num_atoms, num_edges).

  • max_atoms (int | None) – Maximum total atoms across all samples in a batch. None disables the atom count constraint (GPU memory estimate may still apply). At least one of max_atoms or max_batch_size must be set.

  • max_edges (int | None) – Maximum total edges across all samples in a batch. None disables the edge count constraint.

  • max_batch_size (int | None) – Maximum number of samples (graphs) in a batch. None disables the graph-count constraint, letting max_atoms (and/or max_edges) alone control batch capacity. At least one of max_atoms or max_batch_size must be set.

  • bin_width (int) – Atom-count bin width for grouping samples. Default 1.

  • shuffle (bool) – Whether to shuffle within bins. Default False.

  • max_gpu_memory_fraction (float) – Fraction of GPU memory to use when estimating atom capacity. Default 0.8 (80%), leaving 20% headroom for model parameters and CUDA context. Only used when CUDA is available.

Raises:
  • RuntimeError – If any sample in the dataset has num_atoms > max_atoms or num_edges > max_edges — such samples can never be placed into any batch and indicate a configuration error.

  • ValueError – If both max_atoms and max_batch_size are None, max_batch_size < 1, bin_width < 1, or max_gpu_memory_fraction is not in (0.0, 1.0].

Examples

>>> sampler = SizeAwareSampler(dataset, max_atoms=100, max_edges=500, max_batch_size=10)
>>> batch = sampler.build_initial_batch()
>>> replacement = sampler.request_replacement(num_atoms=5, num_edges=20)
__init__(dataset, max_atoms=None, max_edges=None, max_batch_size=None, bin_width=1, shuffle=False, max_gpu_memory_fraction=0.8)[source]#

Initialize the size-aware sampler.

Parameters:
  • dataset (Any) – Dataset with __len__, __getitem__, and get_metadata(idx) methods. get_metadata must return (num_atoms, num_edges).

  • max_atoms (int | None) – Maximum total atoms across all samples in a batch. None disables the atom count constraint (GPU memory estimate may still apply). At least one of max_atoms or max_batch_size must be set.

  • max_edges (int | None) – Maximum total edges across all samples in a batch. None disables the edge count constraint.

  • max_batch_size (int | None) – Maximum number of samples (graphs) in a batch. None disables the graph-count constraint. At least one of max_atoms or max_batch_size must be set.

  • bin_width (int) – Atom-count bin width for grouping samples. Default 1.

  • shuffle (bool) – Whether to shuffle within bins. Default False.

  • max_gpu_memory_fraction (float) – Fraction of GPU memory to use when estimating atom capacity. Default 0.8 (80%), leaving 20% headroom for model parameters and CUDA context. Only used when CUDA is available.

Raises:
  • RuntimeError – If any sample exceeds max_atoms or max_edges constraints.

  • ValueError – If both max_atoms and max_batch_size are None, max_batch_size < 1, bin_width < 1, or max_gpu_memory_fraction is not in (0.0, 1.0].

  • TypeError – If dataset does not implement required interface.

Return type:

None

Methods

__init__(dataset[, max_atoms, max_edges, ...])

Initialize the size-aware sampler.

build_initial_batch()

Build an initial batch using diverse round-robin bin packing.

request_replacement(num_atoms, num_edges)

Request a replacement sample that fits within the given constraints.

request_replacements(node_counts, edge_counts)

Request replacement samples for multiple graduated systems using GPU-native constraint checking.

request_replacements_budget([atom_budget, ...])

Request replacement samples that fit within a total atom/edge budget.

Attributes

exhausted

Check if all samples have been consumed.

max_atoms

Maximum total atoms per batch (user-specified constraint).

max_batch_size

Maximum number of systems per batch (user-specified constraint).

max_edges

Maximum total edges per batch (user-specified constraint).