Developer Guides#

These guides cover how the system is structured underneath the CLI: the inference pipeline a recipe runs through, the configuration layer every recipe shares, the integration surface for adding a new method, common patterns for driving the pipeline from Python, and the shape of an interactive serving session. They are conceptual; the API reference is the per-symbol reference.

The pipeline overview is the anchor for the rest. The config system is the layer every recipe shares; new integrations sit on top of both. Usage patterns and interactive serving describe how the pipeline is embedded in surrounding code.

Inference pipeline overview

The end-to-end computation flow: warmup, CUDA-graph capture, the autoregressive-step body, the ring-attention shard group, and finalize. The mental model the rest of the project assumes.

Inference pipeline overview

Config system

How every overridable field is surfaced as a CLI flag, how recipe defaults compose, and how to layer overrides on top.

Config system

Add a new method

The entry-point surface a new recipe ships against: what to subclass, what to register, and where the parity tests live.

Add a new method

Usage patterns

Common ways to drive FlashDreams from Python: the CLI, the in-process runner API, and the pipeline-level surface for embedding.

How to use FlashDreams as a developer

Interactive serving

Keeping a streaming session alive: warmup, steady-state generation, and how the WebRTC and gRPC servers under integrations/ wire the pipeline up.

Interactive serving

Where these guides fit#

Working forward from a recipe, start with the pipeline overview, then read the recipe’s per-model page under Models, then drop into the matching module under API reference for the implementation details. The Get Started covers the two-command path from install to a generated clip.