Serving#

Serving in FlashDreams is currently integration-driven: model-specific serving stacks wrap the same runner/pipeline abstractions used for offline inference.

Serving building blocks#

Runner config defines serving-relevant I/O fields (prompts, control tensors, image paths, output transport).
Pipeline manages model lifecycle and cached state across steps.
Integration transport (for example WebRTC in LingBot-World) handles session I/O, request routing, and media responses.

Reference integration#

LingBot-World provides the canonical serving integration:

runner and pipeline wiring under integrations/lingbot/lingbot/,
interactive transport under integrations/lingbot/lingbot/webrtc/.

Launch patterns#

Single GPU:

uv run flashdreams-run \
    lingbot-world-fast --example-data True --total-blocks 21

Multi GPU:

uv run torchrun --nproc_per_node=2 --no-python flashdreams-run \
    lingbot-world-fast --example-data True --total-blocks 21

See also#

LingBot-World