nvalchemi.dynamics.SizeAwareSampler#
- class nvalchemi.dynamics.SizeAwareSampler(dataset, max_atoms=None, max_edges=None, max_batch_size=None, bin_width=1, shuffle=False, max_gpu_memory_fraction=0.8)[source]#
Size-aware sampler for inflight batching.
Manages dataset access, capacity budgets, and bin-packing logic for efficient GPU utilization during dynamics simulations. Ensures every replacement sample fits within the memory envelope of the graduated sample it replaces.
When CUDA is available, the sampler uses a heuristic to estimate the maximum number of atoms that fit in GPU memory. This estimate is combined with user-specified
max_atoms— the more restrictive constraint wins. The GPU memory heuristic is best-effort and conservative; users who need tighter control should setmax_atomsexplicitly.- Parameters:
dataset (Any) – Dataset with
__len__,__getitem__, andget_metadata(idx)methods.get_metadatamust return(num_atoms, num_edges).max_atoms (int | None) – Maximum total atoms across all samples in a batch.
Nonedisables the atom count constraint (GPU memory estimate may still apply). At least one ofmax_atomsormax_batch_sizemust be set.max_edges (int | None) – Maximum total edges across all samples in a batch.
Nonedisables the edge count constraint.max_batch_size (int | None) – Maximum number of samples (graphs) in a batch.
Nonedisables the graph-count constraint, lettingmax_atoms(and/ormax_edges) alone control batch capacity. At least one ofmax_atomsormax_batch_sizemust be set.bin_width (int) – Atom-count bin width for grouping samples. Default 1.
shuffle (bool) – Whether to shuffle within bins. Default False.
max_gpu_memory_fraction (float) – Fraction of GPU memory to use when estimating atom capacity. Default 0.8 (80%), leaving 20% headroom for model parameters and CUDA context. Only used when CUDA is available.
- Raises:
RuntimeError – If any sample in the dataset has
num_atoms > max_atomsornum_edges > max_edges— such samples can never be placed into any batch and indicate a configuration error.ValueError – If both
max_atomsandmax_batch_sizeareNone,max_batch_size < 1,bin_width < 1, ormax_gpu_memory_fractionis not in(0.0, 1.0].
Examples
>>> sampler = SizeAwareSampler(dataset, max_atoms=100, max_edges=500, max_batch_size=10) >>> batch = sampler.build_initial_batch() >>> replacement = sampler.request_replacement(num_atoms=5, num_edges=20)
- __init__(dataset, max_atoms=None, max_edges=None, max_batch_size=None, bin_width=1, shuffle=False, max_gpu_memory_fraction=0.8)[source]#
Initialize the size-aware sampler.
- Parameters:
dataset (Any) – Dataset with
__len__,__getitem__, andget_metadata(idx)methods.get_metadatamust return(num_atoms, num_edges).max_atoms (int | None) – Maximum total atoms across all samples in a batch.
Nonedisables the atom count constraint (GPU memory estimate may still apply). At least one ofmax_atomsormax_batch_sizemust be set.max_edges (int | None) – Maximum total edges across all samples in a batch.
Nonedisables the edge count constraint.max_batch_size (int | None) – Maximum number of samples (graphs) in a batch.
Nonedisables the graph-count constraint. At least one ofmax_atomsormax_batch_sizemust be set.bin_width (int) – Atom-count bin width for grouping samples. Default 1.
shuffle (bool) – Whether to shuffle within bins. Default False.
max_gpu_memory_fraction (float) – Fraction of GPU memory to use when estimating atom capacity. Default 0.8 (80%), leaving 20% headroom for model parameters and CUDA context. Only used when CUDA is available.
- Raises:
RuntimeError – If any sample exceeds
max_atomsormax_edgesconstraints.ValueError – If both
max_atomsandmax_batch_sizeareNone,max_batch_size < 1,bin_width < 1, ormax_gpu_memory_fractionis not in(0.0, 1.0].TypeError – If dataset does not implement required interface.
- Return type:
None
Methods
__init__(dataset[, max_atoms, max_edges, ...])Initialize the size-aware sampler.
build_initial_batch()Build an initial batch using diverse round-robin bin packing.
request_replacement(num_atoms, num_edges)Request a replacement sample that fits within the given constraints.
request_replacements(node_counts, edge_counts)Request replacement samples for multiple graduated systems using GPU-native constraint checking.
request_replacements_budget([atom_budget, ...])Request replacement samples that fit within a total atom/edge budget.
Attributes
exhaustedCheck if all samples have been consumed.
max_atomsMaximum total atoms per batch (user-specified constraint).
max_batch_sizeMaximum number of systems per batch (user-specified constraint).
max_edgesMaximum total edges per batch (user-specified constraint).