Serving¶
Serving in FlashDreams is currently integration-driven: model-specific serving stacks wrap the same runner/pipeline abstractions used for offline inference.
Serving building blocks¶
Runner config defines serving-relevant I/O fields (prompts, control tensors, image paths, output transport).
Pipeline manages model lifecycle and cached state across steps.
Integration transport (for example WebRTC in LingBot-World) handles session I/O, request routing, and media responses.
Reference integration¶
LingBot-World provides the canonical serving integration:
runner and pipeline wiring under
integrations/lingbot/lingbot/,interactive transport under
integrations/lingbot/lingbot/webrtc/.
Launch patterns¶
Single GPU:
uv run flashdreams-run \
lingbot-world-fast --example-data True --total-blocks 21
Multi GPU:
uv run torchrun --nproc_per_node=2 --no-python flashdreams-run \
lingbot-world-fast --example-data True --total-blocks 21