How to use FlashDreams as a developer#

Pick the path that matches your goal. For the full runtime map, see Inference pipeline overview.

Run existing models
Clone the repo, install runner extras, then jump to a model page for the exact slug and flags.
Use as a Python library
Install FlashDreams and import runtime contracts from flashdreams.infra.
Add a standalone model
Ship configs and runners from your own package through flashdreams.runner_configs.
Build serving apps
Keep sessions alive and stream outputs through the serving path.

Run existing models from source#

git clone https://github.com/NVIDIA/flashdreams.git
cd flashdreams
uv sync --extra dev --extra runners
uv run flashdreams-run --help

Then pick a model page for actual slugs and flags:

Programmatic access#

pip install flashdreams
from flashdreams.infra.pipeline import StreamInferencePipeline
from my_integration.config import MY_MODEL_RUNNER

runner = MY_MODEL_RUNNER.setup()
runner.run()

For lower-level experiments:

from my_integration.config import MY_PIPELINE

pipeline: StreamInferencePipeline = MY_PIPELINE.setup()
cache = pipeline.initialize_cache(height=480, width=832)

for ar_idx in range(4):
    output = pipeline.generate(ar_idx, cache)
    pipeline.finalize(ar_idx, cache)

Arguments are model-specific; use the integration runner as the source of truth.

Add a new model from a standalone repo#

Minimal entry point:

[project]
name = "my-flashdreams-model"
dependencies = ["flashdreams"]

[project.entry-points."flashdreams.runner_configs"]
my-model-fast = "my_integration.config:MY_MODEL_FAST_RUNNER"

Then:

pip install -e .
flashdreams-run my-model-fast --help

Use Add a new method for the complete authoring guide.