LingBot-World¶
Introduced by Robbyant, LingBot-World is a camera-controllable image-to-video (I2V) world model with streaming inference and context-parallel runtime support.
Teaser video source: LingBot-World project page.
Requirements¶
Minimum VRAM: ~120 GB.
PyTorch: >= 2.9.
Installation¶
# from the repo root
uv sync --project integrations/lingbot
Running the method¶
To run LingBot-World, launch one of the registered runner slugs via
flashdreams-run. For example:
uv run --project integrations/lingbot \
flashdreams-run \
lingbot-world-fast \
--example-data True \
--example-idx 0 \
--pixel-height 464 --pixel-width 832 \
--total-blocks 21
Sample data is downloaded from the
LingBot-World repository.
Valid --example-idx values are 0, 1, 2, 5. Note the single GPU command might run
out of memory for large --total-blocks values.
For multi-GPU inference, use torchrun on top of uv run flashdreams-run
(taking 4 GPUs as an example):
uv run --project integrations/lingbot \
torchrun --nproc_per_node=4 --no-python flashdreams-run \
lingbot-world-fast \
--example-data True \
--example-idx 0 \
--pixel-height 464 --pixel-width 832 \
--total-blocks 21
We provide the following variants:
Method |
Description |
|---|---|
|
Official camera-control I2V (Wan VAE decoder, full KV-cache). |
|
Efficient streaming configuration: TAEHV decoder, |
To inspect all supported CLI arguments and their default values, run:
uv run --project integrations/lingbot \
flashdreams-run \
lingbot-world-fast \
--help
What to expect¶
Example data:
--example-data Truedownloadsimage.jpg,intrinsics.npy,poses.npy,prompt.txtfrom the upstream examples folder intoassets/example_data/lingbot_world/<NN>/(<NN>matches--example-idx). Cached after first run; no credentials needed.Model checkpoint: ~70 GB pulled from
huggingface.co/robbyant/lingbot-world-faston first run, cached under$HF_HOME. ExportHF_TOKENfirst.Disk: keep ~200 GB free for the model + HF cache. Hosts under ~100 GB have been seen to run out mid-load.
First launch: a few minutes (download + Triton autotuning + CUDA-graph warmup). Subsequent launches reuse the caches.
Outputs:
outputs/<runner-slug>.mp4(16 FPS, 464×832 by default) andoutputs/stats_<runner-slug>.json. Override with--output-dir/--pixel-height/--pixel-width/--fps.
See Inference pipeline overview for what one autoregressive chunk does end-to-end.
Some generated samples from the above commands:
Launch the interactive server¶
Spin up the interactive LingBot-World server via WebRTC:
# from the repo root
uv run --package flashdreams-lingbot torchrun --nproc_per_node 4 \
-m lingbot.webrtc.server \
--host 0.0.0.0 --port 8089 \
--config_name lingbot-world-fast-taehv-window15-sink3 \
--example-idx 0
--example-idx selects which example to download
(0, 1, 2, 5); assets auto-download on first launch.
The HTTP port opens only after model load + warmup — a few minutes on
first launch, much faster afterwards. When ready the server prints
Connect via http://<server-ip>:8089/request_session (use
localhost when running locally).
Note
On a remote or cloud GPU instance (e.g. Brev),
the server port is usually not reachable at the host IP directly.
Forward it to your local machine first, then open
http://localhost:8089/request_session:
# Brev
brev port-forward <instance> -p 8089:8089
# or plain SSH
ssh -L 8089:localhost:8089 <user>@<host>
When successfully connected, the browser-based UI looks like this:
Profiling benchmark¶
Here is the profiling benchmark on total DiT runtime for FlashDreams LingBot-World compared to the official LingBot-World implementation and LightX2V under matched settings.
This chart shows total DiT runtime (4 diffusion steps) in milliseconds at the 6th autoregressive rollout on 4x GPUs. For an apples-to-apples comparison, all implementations are forced to use cuDNN attention backend under matched runtime settings, and all runs use Ulysses sequence parallelism for multi-GPU inference. For the official LingBot-World implementation, see this instruction. For the LightX2V baseline, see this instruction.
Citation¶
If you use LingBot-World, please cite the original work:
@article{lingbot-world,
title={Advancing Open-source World Models},
author={Robbyant Team and Zelin Gao and Qiuyu Wang and Yanhong Zeng and Jiapeng Zhu and Ka Leong Cheng and Yixuan Li and Hanlin Wang and Yinghao Xu and Shuailei Ma and Yihang Chen and Jie Liu and Yansong Cheng and Yao Yao and Jiayi Zhu and Yihao Meng and Kecheng Zheng and Qingyan Bai and Jingye Chen and Zehong Shen and Yue Yu and Xing Zhu and Yujun Shen and Hao Ouyang},
journal={arXiv preprint arXiv:2601.20540},
year={2026}
}