Launch your first model¶
This page provides a minimal path for:
Offline / batch-like long-rollout model inference with Self-Forcing.
Online interactive world-model serving with LingBot-World.
Prerequisites¶
Complete the setup in Installation first.
Run Self-Forcing T2V offline inference¶
Launch an offline inference run using the Self-Forcing model:
uv run --project integrations/self_forcing \
flashdreams-run self-forcing-wan2.1-t2v-1.3b-taehv \
--total-blocks 7
First runs take several minutes (Triton autotuning + CUDA-graph
warmup); subsequent runs finish in well under a minute. Output lands
at outputs/self-forcing-wan2.1-t2v-1.3b-taehv.mp4 (16 FPS, 480×832
by default). See Self-Forcing for --total-blocks,
measured runtimes, and multi-GPU guidance.
Run LingBot-World interactive server¶
Launch an interactive serving session using the LingBot-World model:
uv run --project integrations/lingbot \
flashdreams-run lingbot-world-fast \
--example-data True \
--total-blocks 21
Next steps¶
Explore models:
Models - Browse all supported models, their specific launch commands, and configurations.
For developers:
Inference pipeline overview - Learn about the system architecture and generation loop.
Config system - Understand how to modify pipeline and runner configurations.
Add a new method - Guide to adding your own custom models and methods.
API reference - Check the Python API and CLI references.