Dynamics: Optimization and Molecular Dynamics#
The dynamics module provides a unified framework for running geometry optimizations and molecular dynamics simulations on GPU. All simulation types share a common execution loop — hooks, model evaluation, convergence checking — so you learn the pattern once and apply it to any integrator.
Tip
It is important to keep in mind that nvalchemi follows a batch-first principle:
users should think and reason about dynamics workflows with multiple structures
simultaneously, as opposed to individual structures being processed sequentially.
The execution loop#
Every simulation is driven by BaseDynamics,
which defines a single step() that all integrators and optimizers follow. The
loop is broken into discrete stages, enumerated by
HookStageEnum:
Stage |
When it fires |
|---|---|
|
At the very beginning of a step, before any operations |
|
Just before the integrator’s first half-step |
|
After the first half-step completes |
|
Just before the model forward pass |
|
After the model forward pass completes |
|
Just before the integrator’s second half-step |
|
After the second half-step completes |
|
At the very end of a step, after all operations |
|
When a convergence criterion is met |
A single call to step() proceeds through these stages in order:
BEFORE_STEP hooks fire.
pre_update(batch)— the integrator’s first half-step (e.g. update velocities by half a timestep), bracketed by BEFORE/AFTER_PRE_UPDATE hooks.compute(batch)— the wrapped ML model evaluates forces (and stresses, if needed), bracketed by BEFORE/AFTER_COMPUTE hooks.post_update(batch)— the integrator’s second half-step (e.g. complete the velocity update with the new forces), bracketed by BEFORE/AFTER_POST_UPDATE hooks.AFTER_STEP hooks fire (convergence checks, logging, …).
Convergence is evaluated: converged systems fire ON_CONVERGE hooks and (in multi-stage pipelines) migrate to the next stage.
run(batch, n_steps) calls step() in a loop until all systems converge or
n_steps is reached. Every hook declares which
HookStageEnum stage it should fire at and at
what frequency, so you have fine-grained control over when callbacks execute.
Using dynamics as a context manager#
All dynamics objects (optimizers, integrators, fused stages) support Python’s
context manager protocol. The with block manages a dedicated
torch.cuda.Stream for the simulation and ensures hooks are properly opened and
closed:
from nvalchemi.dynamics import FIRE, ConvergenceHook
with FIRE(model=model, dt=0.1, n_steps=500, hooks=[ConvergenceHook(fmax=0.05)]) as opt:
relaxed = opt.run(batch)
When you call run() without a with block, hook setup and teardown happen
automatically inside run(). The context manager form is useful when you need to
call step() manually or interleave dynamics with other operations while keeping
hook state (e.g. open log files) alive.
Multi-stage pipelines with FusedStage#
Real workflows often chain multiple simulation phases: relax a structure, then run
MD at increasing temperatures, then relax again. The
FusedStage abstraction lets you compose stages
with the + operator:
from nvalchemi.dynamics import FIRE, NVTLangevin, ConvergenceHook
relax = FIRE(model=model, dt=0.1, n_steps=200, hooks=[ConvergenceHook(fmax=0.05)])
md = NVTLangevin(model=model, dt=1.0, temperature=300.0, n_steps=5000)
pipeline = relax + md
with pipeline:
pipeline.run(batch)
Systems start in the first stage (relaxation). As each system converges, it automatically migrates to the next stage (MD). Different systems can be in different stages simultaneously — the batch is partitioned internally, and a single model forward pass is shared across all active systems regardless of which stage they belong to.
Compiling with torch.compile#
FusedStage can compile its entire step function
with torch.compile to reduce Python overhead and enable kernel fusion. Call
compile() after composing stages:
fused = (relax + md).compile(fullgraph=True)
with fused:
fused.run(batch)
compile() wraps the internal _step_impl method — which includes hook dispatch,
masked sub-stage updates, and the shared model forward pass — in a single compiled
graph. It returns the same instance, so you can chain it fluently.
You can also defer compilation by passing compile_step=True at construction time.
In that case, torch.compile is invoked lazily when the context manager is entered:
fused = relax + md # compile_step inherited from sub-stages or set explicitly
with fused: # compilation happens here
fused.run(batch)
Any keyword arguments accepted by torch.compile (e.g. fullgraph, mode,
backend) can be passed to .compile() or stored via compile_kwargs at
construction.
Note
Not all hooks are graph-break-free under fullgraph=True. Hooks that perform
Python-side control flow (e.g. logging, I/O) will introduce graph breaks. If you
need an unbroken graph, ensure your hooks are written with torch-compatible
operations only.
Distributed pipelines#
When a workflow needs more than one GPU — for example, relaxing structures on one
device and running MD on another — the
DistributedPipeline distributes stages across
ranks. Where + fuses stages onto a single GPU, the | operator (or a stages
dictionary) assigns one stage per rank and wires up inter-rank communication
automatically.
Configuring a pipeline#
Each rank owns a BaseDynamics (or
FusedStage) instance. Stages are collected in a
dictionary keyed by global rank and handed to
DistributedPipeline:
from nvalchemi.dynamics import FIRE, NVTLangevin, DistributedPipeline
from nvalchemi.dynamics.base import BufferConfig
buffer_cfg = BufferConfig(num_systems=4, num_nodes=50, num_edges=0)
stages = {
0: FIRE(model=model, buffer_config=buffer_cfg, ...), # upstream — relaxation
1: NVTLangevin(model=model, buffer_config=buffer_cfg, ...), # downstream — MD
}
pipeline = DistributedPipeline(stages=stages, backend="nccl")
with pipeline:
pipeline.run()
By default, setup() (called automatically by the context manager) sorts stages by
rank and wires prior_rank / next_rank between adjacent stages as a simple linear
chain. For more sophisticated topologies — such as multiple independent
sub-pipelines running in the same job — set prior_rank and next_rank explicitly
on each stage:
stages = {
# Sub-pipeline A: rank 0 → rank 1
0: FIRE(model=model, buffer_config=buffer_cfg, prior_rank=None, next_rank=1, ...),
1: NVTLangevin(model=model, buffer_config=buffer_cfg, prior_rank=0, next_rank=None, ...),
# Sub-pipeline B: rank 2 → rank 3
2: FIRE(model=model, buffer_config=buffer_cfg, prior_rank=None, next_rank=3, ...),
3: NVTLangevin(model=model, buffer_config=buffer_cfg, prior_rank=2, next_rank=None, ...),
}
The first stage in each sub-pipeline typically owns a sampler that feeds new structures into the chain; the last stage owns one or more data sinks that collect converged results.
Note
Each rank currently communicates with at most one upstream and one downstream neighbour (one-to-one topology). Fan-out (one-to-many) and fan-in (many-to-one) patterns are planned for a future release.
Sizing the buffer#
NCCL point-to-point transfers require fixed-size tensors, so each communicating stage
pre-allocates a send buffer and a receive buffer whose dimensions are set by
BufferConfig. The three fields control how much
data a single transfer can carry:
Field |
What it controls |
|---|---|
|
Maximum number of graphs (structures) per transfer. Determines throughput per step — higher values move more data but consume more GPU memory. |
|
Total atom capacity across all graphs in the buffer. Must be large enough for the worst-case combination of systems. For example, transferring up to 4 structures of at most 50 atoms each requires |
|
Total edge capacity. Set to 0 when the downstream model recomputes edges via its neighbor list (the common case). Only set a non-zero value if pre-computed edge attributes must be transferred. |
from nvalchemi.dynamics.base import BufferConfig
# 4 structures, up to 200 atoms total, edges recomputed downstream
buffer_cfg = BufferConfig(num_systems=4, num_nodes=200, num_edges=0)
When the upstream stage has more converged samples than num_systems allows in a
single transfer, the excess stays in the active batch as a no-op until the next
step — this is the back-pressure mechanism described below.
Important
Every pair of communicating stages must share an identical
BufferConfig.
DistributedPipeline.setup() validates this and raises an error on mismatch.
Buffer synchronization#
The diagram below shows how two adjacent ranks exchange data through pre-allocated send and receive buffers during a single step. The upstream rank pushes converged samples; the downstream rank pulls them into its active batch.
![digraph buffer_sync {
rankdir=LR
compound=true
fontname="Helvetica"
node [fontname="Helvetica" fontsize=11]
edge [fontname="Helvetica" fontsize=10]
subgraph cluster_upstream {
label="Rank 0 (upstream)"
style=rounded
color="#4a90d9"
fontcolor="#4a90d9"
u_batch [label="active_batch" shape=box style=filled fillcolor="#dce6f1"]
u_send [label="send_buffer" shape=box style=filled fillcolor="#f9e2ae"]
u_sinks [label="sinks\n(overflow)" shape=box style=dashed]
u_batch -> u_send [label="converged\nsamples" style=bold]
u_batch -> u_sinks [label="excess\n(back-pressure)" style=dotted]
}
subgraph cluster_downstream {
label="Rank 1 (downstream)"
style=rounded
color="#5bb35b"
fontcolor="#5bb35b"
d_recv [label="recv_buffer" shape=box style=filled fillcolor="#f9e2ae"]
d_batch [label="active_batch" shape=box style=filled fillcolor="#dce6f1"]
d_sinks [label="sinks\n(results)" shape=box style=dashed]
d_recv -> d_batch [label="incoming\nsamples" style=bold]
d_batch -> d_sinks [label="converged\nresults" style=bold]
d_sinks -> d_batch [label="drain when\ncapacity available" style=dotted]
}
u_send -> d_recv [label="isend / irecv\n(NCCL)" style=bold color="#c0392b" fontcolor="#c0392b" penwidth=2]
}](../_images/graphviz-26e3b6bad913f32a1d138565dc5d855b0773e649.png)
Buffer synchronization between two adjacent ranks in a DistributedPipeline.#
A step proceeds as follows:
Pre-step — The downstream rank zeros its receive buffer and posts an asynchronous
irecvfrom itsprior_rank. Inasync_recvmode (the default), the wait is deferred until later in the step; insyncmode it blocks immediately.Complete receive — The downstream rank waits on the pending receive, then routes incoming samples into its active batch (or overflow sinks if the batch is full).
Step — Both ranks execute their respective integrator or optimizer on their active batches.
Post-step — The upstream rank identifies converged samples, copies them into its send buffer (up to
BufferConfigcapacity), and issues anisend. An empty buffer is always sent to prevent deadlocks. The final stage routes converged samples to its sinks instead.
Tip
Back-pressure: when the send buffer is full, excess converged samples remain in the upstream active batch as no-ops until buffer capacity opens up. This naturally throttles fast producers without dropping data.
Communication modes#
The comm_mode parameter controls how aggressively communication overlaps with
computation:
Mode |
Behavior |
|---|---|
|
Blocks on |
|
Posts |
|
Also defers |
Launching#
Distributed pipelines are launched with torchrun (or any torch.distributed
launcher):
torchrun --nproc_per_node=2 my_pipeline.py
DistributedPipeline calls init_distributed() on entry and coordinates
termination across ranks via an all_reduce on per-rank done flags.
See also
The Distributed Pipeline Examples gallery contains end-to-end examples, including multi-pipeline topologies and monitoring with persistent storage.
What’s next#
Optimization and Integrators — FIRE, NVE, NVT, NPT and their configuration.
Hooks — the hook protocol, built-in hooks, and writing custom hooks.
Data Sinks — recording trajectories and simulation results.
See also#
Examples:
02_dynamics_example.pydemonstrates a complete relaxation and MD workflow.API: See the
nvalchemi.dynamicsmodule for the full reference, including the hook protocol and distributed pipeline documentation.Data guide: The AtomicData and Batch guide covers the input data structures consumed by dynamics.