nvalchemi.dynamics.SizeAwareSampler#
- class nvalchemi.dynamics.SizeAwareSampler(dataset, max_atoms, max_edges, max_batch_size, bin_width=1, shuffle=False, max_gpu_memory_fraction=0.8)[source]#
Size-aware sampler for inflight batching.
Manages dataset access, capacity budgets, and bin-packing logic for efficient GPU utilization during dynamics simulations. Ensures every replacement sample fits within the memory envelope of the graduated sample it replaces.
When CUDA is available, the sampler uses a heuristic to estimate the maximum number of atoms that fit in GPU memory. This estimate is combined with user-specified
max_atoms— the more restrictive constraint wins. The GPU memory heuristic is best-effort and conservative; users who need tighter control should setmax_atomsexplicitly.- Parameters:
dataset (Any) – Dataset with
__len__,__getitem__, andget_metadata(idx)methods.get_metadatamust return(num_atoms, num_edges).max_atoms (int | None) – Maximum total atoms across all samples in a batch.
Nonedisables the atom count constraint (GPU memory estimate may still apply).max_edges (int | None) – Maximum total edges across all samples in a batch.
Nonedisables the edge count constraint.max_batch_size (int) – Maximum number of samples (graphs) in a batch.
bin_width (int) – Atom-count bin width for grouping samples. Default 1.
shuffle (bool) – Whether to shuffle within bins. Default False.
max_gpu_memory_fraction (float) – Fraction of GPU memory to use when estimating atom capacity. Default 0.8 (80%), leaving 20% headroom for model parameters and CUDA context. Only used when CUDA is available.
- Raises:
RuntimeError – If any sample in the dataset has
num_atoms > max_atomsornum_edges > max_edges— such samples can never be placed into any batch and indicate a configuration error.ValueError – If
max_batch_size < 1,bin_width < 1, ormax_gpu_memory_fractionis not in(0.0, 1.0].
Examples
>>> sampler = SizeAwareSampler(dataset, max_atoms=100, max_edges=500, max_batch_size=10) >>> batch = sampler.build_initial_batch() >>> replacement = sampler.request_replacement(num_atoms=5, num_edges=20)
- __init__(dataset, max_atoms, max_edges, max_batch_size, bin_width=1, shuffle=False, max_gpu_memory_fraction=0.8)[source]#
Initialize the size-aware sampler.
- Parameters:
dataset (Any) – Dataset with
__len__,__getitem__, andget_metadata(idx)methods.get_metadatamust return(num_atoms, num_edges).max_atoms (int | None) – Maximum total atoms across all samples in a batch.
Nonedisables the atom count constraint (GPU memory estimate may still apply).max_edges (int | None) – Maximum total edges across all samples in a batch.
Nonedisables the edge count constraint.max_batch_size (int) – Maximum number of samples (graphs) in a batch.
bin_width (int) – Atom-count bin width for grouping samples. Default 1.
shuffle (bool) – Whether to shuffle within bins. Default False.
max_gpu_memory_fraction (float) – Fraction of GPU memory to use when estimating atom capacity. Default 0.8 (80%), leaving 20% headroom for model parameters and CUDA context. Only used when CUDA is available.
- Raises:
RuntimeError – If any sample exceeds
max_atomsormax_edgesconstraints.ValueError – If
max_batch_size < 1,bin_width < 1, ormax_gpu_memory_fractionis not in(0.0, 1.0].TypeError – If dataset does not implement required interface.
- Return type:
None
Methods
__init__(dataset, max_atoms, max_edges, ...)Initialize the size-aware sampler.
build_initial_batch()Build an initial batch using greedy bin packing.
request_replacement(num_atoms, num_edges)Request a replacement sample that fits within the given constraints.
request_replacements(node_counts, edge_counts)Request replacement samples for multiple graduated systems using GPU-native constraint checking.
Attributes
exhaustedCheck if all samples have been consumed.