Wan2.1

Wan2.1 is a bidirectional video generation model, supporting both text-to-video (T2V) and image-to-video (I2V) tasks.

Requirements

  • Minimum VRAM: ~46 GB.

  • PyTorch: >= 2.9.

Installation

# from the repo root
uv sync --project integrations/wan21

Running the method

To run Wan2.1, launch one of the registered runner slugs via flashdreams-run. For example:

uv run --project integrations/wan21 \
    flashdreams-run \
    wan21-t2v-1.3b-480p \
    --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
    --pixel-height 832 --pixel-width 480

For multi-GPU inference, use torchrun on top of uv run flashdreams-run (taking 4 GPUs as an example):

uv run --project integrations/wan21 \
    torchrun --nproc_per_node=4 --no-python flashdreams-run \
    wan21-t2v-1.3b-480p \
    --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
    --pixel-height 832 --pixel-width 480

For I2V, run with the following command:

uv run --project integrations/wan21 \
    flashdreams-run \
    wan21-i2v-14b-480p \
    --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
    --image-path https://raw.githubusercontent.com/Wan-Video/Wan2.1/main/examples/i2v_input.JPG \
    --pixel-height 832 --pixel-width 480

We provide the following variants:

Method

Description

wan21-t2v-1.3b-480p

Wan 2.1 T2V 1.3B at 480p (single AR step, prompt-only).

wan21-i2v-14b-480p

Wan 2.1 I2V 14B at 480p (single AR step, prompt + first-frame).

To inspect all supported CLI arguments and their default values, run:

uv run --project integrations/wan21 \
    flashdreams-run \
    wan21-t2v-1.3b-480p \
    --help

Some generated samples from the above commands:

prompt: "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
prompt: "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
image: https://raw.githubusercontent.com/Wan-Video/Wan2.1/main/examples/i2v_input.JPG

Profiling benchmark

Here is the profiling benchmark on DiT per-step runtime for FlashDreams Wan2.1 compared to the official Wan2.1 implementation and the FastVideo baseline under matched settings.

This chart shows per-diffusion-step DiT runtime in milliseconds with CFG at 480p (81 frames) on a single GPU. For an apples-to-apples comparison, all implementations are forced to use cuDNN attention backend under matched runtime settings. For the official Wan2.1 implementation, see this instruction. For the FastVideo baseline, see this instruction.

Citation

If you use Wan2.1, please cite the original work:

@article{wan2025,
      title={Wan: Open and Advanced Large-Scale Video Generative Models},
      author={Team Wan and Ang Wang and Baole Ai and Bin Wen and Chaojie Mao and Chen-Wei Xie and Di Chen and Feiwu Yu and Haiming Zhao and Jianxiao Yang and Jianyuan Zeng and Jiayu Wang and Jingfeng Zhang and Jingren Zhou and Jinkai Wang and Jixuan Chen and Kai Zhu and Kang Zhao and Keyu Yan and Lianghua Huang and Mengyang Feng and Ningyi Zhang and Pandeng Li and Pingyu Wu and Ruihang Chu and Ruili Feng and Shiwei Zhang and Siyang Sun and Tao Fang and Tianxing Wang and Tianyi Gui and Tingyu Weng and Tong Shen and Wei Lin and Wei Wang and Wei Wang and Wenmeng Zhou and Wente Wang and Wenting Shen and Wenyuan Yu and Xianzhong Shi and Xiaoming Huang and Xin Xu and Yan Kou and Yangyu Lv and Yifei Li and Yijing Liu and Yiming Wang and Yingya Zhang and Yitong Huang and Yong Li and You Wu and Yu Liu and Yulin Pan and Yun Zheng and Yuntao Hong and Yupeng Shi and Yutong Feng and Zeyinzi Jiang and Zhen Han and Zhi-Fan Wu and Ziyu Liu},
      journal={arXiv preprint arXiv:2503.20314},
      year={2025}
}