How to use FlashDreams as a developer#

Pick the path that matches your goal. For the full runtime map, see Inference pipeline overview.

Run existing models

Clone the repo, install runner extras, then jump to a model page for the exact slug and flags.

Use as a Python library

Install FlashDreams and import runtime contracts from flashdreams.infra.

Add a standalone model

Ship configs and runners from your own package through flashdreams.runner_configs.

Build serving apps

Keep sessions alive and stream outputs through the serving path.

Run existing models from source#

git clone https://github.com/NVIDIA/flashdreams.git
cd flashdreams
uv sync --extra dev --extra runners
uv run flashdreams-run --help

Then pick a model page for actual slugs and flags:

Programmatic access#

pip install flashdreams

from flashdreams.infra.pipeline import StreamInferencePipeline
from my_integration.config import MY_MODEL_RUNNER

runner = MY_MODEL_RUNNER.setup()
runner.run()

For lower-level experiments:

from my_integration.config import MY_PIPELINE

pipeline: StreamInferencePipeline = MY_PIPELINE.setup()
cache = pipeline.initialize_cache(height=480, width=832)

for ar_idx in range(4):
    output = pipeline.generate(ar_idx, cache)
    pipeline.finalize(ar_idx, cache)

Arguments are model-specific; use the integration runner as the source of truth.

Add a new model from a standalone repo#

Minimal entry point:

[project]
name = "my-flashdreams-model"
dependencies = ["flashdreams"]

[project.entry-points."flashdreams.runner_configs"]
my-model-fast = "my_integration.config:MY_MODEL_FAST_RUNNER"

Then:

pip install -e .
flashdreams-run my-model-fast --help

Use Add a new method for the complete authoring guide.

Next links#

Models for the list of supported models with launch commands.
Add a new method for integration authoring.
Interactive serving for serving concepts.