FlashDreams#

A high-performance inference and serving library for interactive autoregressive video and world models.

Why FlashDreams#

FlashDreams is built for the case where a diffusion video model has to respond in real time — a closed-loop world-model demo, a driving simulator, an interactive scene rollout. The optimisations needed for that case are different from those used by an offline, one-shot video generator, and FlashDreams organises them into three abstractions that every shipped recipe uses.

KV-cached transformers. Each autoregressive chunk re-uses prior context as a KV cache instead of recomputing it. Self-forcing and causal-forcing training regimes are first-class.

Ring attention. Context-parallel attention across ranks, so long-horizon generation scales out instead of OOM-ing on a single GPU.

CUDA-graph capture. The steady-state forward is captured into a CUDA graph after warmup, collapsing Python and launch overhead in the hot loop.

The library is Apache-2.0 and developed in the open. The internals are covered in the documentation.

Performance#

Each tile shows per-step latency at steady state — post-warmup, post-graph-capture — measured against the upstream library’s own runner on the same hardware and the same checkpoint. Full methodology lives on the benchmarks page.

2.12×

Self-Forcing speedup

GB300, vs FastVideo baseline (362 ms → 171 ms per step).

3.10×

LingBot-World speedup

H100, vs Official baseline (1950 ms → 629 ms per step).

1.40×

Wan2.1 speedup

GB300, vs FastVideo baseline (534 ms → 382 ms per step).

8

Integrated models

Streaming and bidirectional recipes, one CLI.

Try FlashDreams#

The Get Started guide walks from a fresh checkout to a generated frame on a single GPU.

Supported models#

Streaming and autoregressive recipes emit per-step output with sub-second latency once warm; bidirectional recipes are kept as full-block parity references. Each model page carries the canonical invocation, the checkpoint source, and the per-recipe knobs.

Self-Forcing

Streaming Wan 2.1 T2V via the Self-Forcing plugin. Sub-second steps after warmup on H100 / GB200.

Self-Forcing
Causal-Forcing

Causal-forcing framewise T2V and I2V variants of Wan 2.1 via the Causal-Forcing plugin.

Causal-Forcing
Causal Wan 2.2

FastVideo Wan 2.2 14B causal T2V recipe.

Causal Wan2.2
LingBot-World

Camera-controlled I2V with bundled prompt, first-frame, and camera arrays.

LingBot-World
OmniDreams

Single-view and multi-view streaming recipes against the OmniDreams checkpoints, including a diffusion-forcing AR variant.

NVIDIA OmniDreams
FlashVSR

Streaming video super-resolution for the FlashVSR checkpoint family.

FlashVSR
Wan 2.1 (bidirectional)

Bidirectional reference model used for parity testing — T2V 1.3B / 480p and I2V 14B / 480p.

Wan2.1
Cosmos-Predict2.5 (bidirectional)

Bidirectional Cosmos-Predict2 reference recipes (T2V / I2V, 2B).

Cosmos-Predict2.5