FlashDreams#
A high-performance inference and serving library for interactive autoregressive video and world models.
Why FlashDreams#
FlashDreams is built for the case where a diffusion video model has to respond in real time — a closed-loop world-model demo, a driving simulator, an interactive scene rollout. The optimisations needed for that case are different from those used by an offline, one-shot video generator, and FlashDreams organises them into three abstractions that every shipped recipe uses.
KV-cached transformers. Each autoregressive chunk re-uses prior context as a KV cache instead of recomputing it. Self-forcing and causal-forcing training regimes are first-class.
Ring attention. Context-parallel attention across ranks, so long-horizon generation scales out instead of OOM-ing on a single GPU.
CUDA-graph capture. The steady-state forward is captured into a CUDA graph after warmup, collapsing Python and launch overhead in the hot loop.
The library is Apache-2.0 and developed in the open. The internals are covered in the documentation.
Performance#
Each tile shows per-step latency at steady state — post-warmup, post-graph-capture — measured against the upstream library’s own runner on the same hardware and the same checkpoint. Full methodology lives on the benchmarks page.
2.12×
Self-Forcing speedup
GB300, vs FastVideo baseline (362 ms → 171 ms per step).
3.10×
LingBot-World speedup
H100, vs Official baseline (1950 ms → 629 ms per step).
1.40×
Wan2.1 speedup
GB300, vs FastVideo baseline (534 ms → 382 ms per step).
8
Integrated models
Streaming and bidirectional recipes, one CLI.
Try FlashDreams#
The Get Started guide walks from a fresh checkout to a generated frame on a single GPU.
Supported models#
Streaming and autoregressive recipes emit per-step output with sub-second latency once warm; bidirectional recipes are kept as full-block parity references. Each model page carries the canonical invocation, the checkpoint source, and the per-recipe knobs.
Streaming Wan 2.1 T2V via the Self-Forcing plugin. Sub-second steps after warmup on H100 / GB200.
Causal-forcing framewise T2V and I2V variants of Wan 2.1 via the Causal-Forcing plugin.
FastVideo Wan 2.2 14B causal T2V recipe.
Camera-controlled I2V with bundled prompt, first-frame, and camera arrays.
Single-view and multi-view streaming recipes against the OmniDreams checkpoints, including a diffusion-forcing AR variant.
Streaming video super-resolution for the FlashVSR checkpoint family.
Bidirectional reference model used for parity testing — T2V 1.3B / 480p and I2V 14B / 480p.
Bidirectional Cosmos-Predict2 reference recipes (T2V / I2V, 2B).