Causal Wan2.2¶
CausalWan2.2 is a FastVideo-released 14B MoE causal-diffusion variant of Wan 2.2 with 8-step inference.
Requirements¶
Minimum VRAM: ~112 GB.
PyTorch: >= 2.9.
Installation¶
# from the repo root
uv sync --project integrations/fastvideo_causal_wan22
Running the method¶
To run Causal Wan2.2, launch the registered runner slug via
flashdreams-run. For example:
uv run --project integrations/fastvideo_causal_wan22 \
flashdreams-run \
fastvideo-causal-wan2.2-t2v-14b \
--prompt "A stylish woman strolls down a bustling Tokyo street, the warm glow of neon lights and animated city signs casting vibrant reflections. She wears a sleek black leather jacket paired with a flowing red dress and black boots, her black purse slung over her shoulder. Sunglasses perched on her nose and a bold red lipstick add to her confident, casual demeanor. The street is damp and reflective, creating a mirror-like effect that enhances the colorful lights and shadows. Pedestrians move about, adding to the lively atmosphere. The scene is captured in a dynamic medium shot with the woman walking slightly to one side, highlighting her graceful strides." \
--pixel-height 480 --pixel-width 832 \
--total-blocks 7
uv run --project integrations/fastvideo_causal_wan22 \
flashdreams-run \
fastvideo-causal-wan2.2-t2v-14b \
--prompt "A playful raccoon is seen playing an electronic guitar, strumming the strings with its front paws. The raccoon has distinctive black facial markings and a bushy tail. It sits comfortably on a small stool, its body slightly tilted as it focuses intently on the instrument. The setting is a cozy, dimly lit room with vintage posters on the walls, adding a retro vibe. The raccoon's expressive eyes convey a sense of joy and concentration. Medium close-up shot, focusing on the raccoon's face and hands interacting with the guitar." \
--pixel-height 480 --pixel-width 832 \
--total-blocks 7
For multi-GPU inference, use torchrun on top of uv run flashdreams-run
(taking 4 GPUs as an example):
uv run --project integrations/fastvideo_causal_wan22 \
torchrun --nproc_per_node=4 --no-python flashdreams-run \
fastvideo-causal-wan2.2-t2v-14b \
--prompt "A stylish woman strolls down a bustling Tokyo street, the warm glow of neon lights and animated city signs casting vibrant reflections. She wears a sleek black leather jacket paired with a flowing red dress and black boots, her black purse slung over her shoulder. Sunglasses perched on her nose and a bold red lipstick add to her confident, casual demeanor. The street is damp and reflective, creating a mirror-like effect that enhances the colorful lights and shadows. Pedestrians move about, adding to the lively atmosphere. The scene is captured in a dynamic medium shot with the woman walking slightly to one side, highlighting her graceful strides." \
--pixel-height 480 --pixel-width 832 \
--total-blocks 21
We provide the following variant:
Method |
Description |
|---|---|
|
FastVideo CausalWan 2.2 14B MoE T2V (Wan VAE decoder, 8-step). |
To inspect all supported CLI arguments and their default values, run:
uv run --project integrations/fastvideo_causal_wan22 \
flashdreams-run \
fastvideo-causal-wan2.2-t2v-14b \
--help
Some generated samples from the above commands:
Citation¶
If you use Causal Wan2.2, please cite the original work:
@article{zhang2025fast,
title={Fast video generation with sliding tile attention},
author={Zhang, Peiyuan and Chen, Yongqi and Su, Runlong and Ding, Hangliang and Stoica, Ion and Liu, Zhengzhong and Zhang, Hao},
journal={arXiv preprint arXiv:2502.04507},
year={2025}
}