FlashVSR¶
FlashVSR is a one-diffusion-step streaming diffusion framework for real-time video super-resolution (VSR). It combines a train-friendly three-stage distillation pipeline, locality-constrained sparse attention that bridges the train-test resolution gap, and a tiny conditional decoder for fast reconstruction.
Teaser image source: FlashVSR official repository.
Requirements¶
Minimum VRAM: ~24 GB.
PyTorch: >= 2.9.
Installation¶
# from the repo root
uv sync --project integrations/flashvsr
Running the method¶
To run FlashVSR, provide an input video path and launch one of the registered
runner slugs via flashdreams-run. For example:
uv run --project integrations/flashvsr \
flashdreams-run \
flashvsr-v1.1-sparse-ratio-2.0 \
--input-path https://raw.githubusercontent.com/OpenImagingLab/FlashVSR/main/examples/WanVSR/inputs/example1.mp4 \
--chunk-size 8
For multi-GPU inference, use the dense full-attention preset with torchrun
on top of uv run flashdreams-run (taking 4 GPUs as an example):
uv run --project integrations/flashvsr \
torchrun --nproc_per_node=4 --no-python flashdreams-run \
flashvsr-v1.1-full-attn \
--input-path https://raw.githubusercontent.com/OpenImagingLab/FlashVSR/main/examples/WanVSR/inputs/example1.mp4 \
--chunk-size 8
Note
Multi-GPU is supported only by the dense flashvsr-v1.1-full-attn preset.
The flashvsr-v1.1-sparse-ratio-* presets are single-GPU only because
their Triton sparse-attention backend is not context-parallel aware.
We provide the following variants:
Method |
Description |
|---|---|
|
Streaming 2x video super-resolution with the stable sparse-attention preset. |
|
Streaming 2x video super-resolution with the faster sparse-attention preset. |
|
Dense full-attention preset with multi-GPU context-parallel support. |
To inspect all supported CLI arguments and their default values, run:
uv run --project integrations/flashvsr \
flashdreams-run \
flashvsr-v1.1-sparse-ratio-2.0 \
--help
A generated sample from the above commands:
Profiling benchmark¶
Here is the profiling benchmark on per-chunk 2x upsampling time for FlashDreams FlashVSR compared to the official FlashVSR implementation under matched settings.
This chart shows per-chunk 2x upsampling time in milliseconds on a single GB200 GPU with a chunk size of 8 frames. For the official FlashVSR implementation, see this instruction.
Citation¶
If you use FlashVSR, please cite the original work:
@inproceedings{zhuang2026flashvsr,
title={FlashVSR: Towards Real-time Diffusion-Based Streaming Video Super Resolution},
author={Zhuang, Junhao and Guo, Shi and Cai, Xin and Li, Xiaohui and Liu, Yihao and Yuan, Chun and Xue, Tianfan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={43482--43493},
year={2026}
}