FlashVSR#

FlashVSR is a one-diffusion-step streaming diffusion framework for real-time video super-resolution (VSR). It combines a train-friendly three-stage distillation pipeline, locality-constrained sparse attention that bridges the train-test resolution gap, and a tiny conditional decoder for fast reconstruction.

Teaser image source: FlashVSR official repository.

Requirements#

Minimum VRAM: ~24 GB.
PyTorch: >= 2.9.

Installation#

# from the repo root
uv sync --project integrations/flashvsr

Running the method#

To run FlashVSR, provide an input video path and launch one of the registered runner slugs. For example:

uv run --project integrations/flashvsr \
    flashdreams-run \
    flashvsr-v1.1-sparse-ratio-2.0 \
    --input-path https://raw.githubusercontent.com/OpenImagingLab/FlashVSR/main/examples/WanVSR/inputs/example1.mp4 \
    --chunk-size 8

For multi-GPU inference, run the dense full-attention preset under torchrun (taking 4 GPUs as an example):

uv run --project integrations/flashvsr \
    torchrun --nproc_per_node=4 --no-python flashdreams-run \
    flashvsr-v1.1-full-attn \
    --input-path https://raw.githubusercontent.com/OpenImagingLab/FlashVSR/main/examples/WanVSR/inputs/example1.mp4 \
    --chunk-size 8

Note

Multi-GPU is supported only by the dense flashvsr-v1.1-full-attn preset. The flashvsr-v1.1-sparse-ratio-* presets are single-GPU only because their Triton sparse-attention backend is not context-parallel aware.

We provide the following variants:

Method	Description
`flashvsr-v1.1-sparse-ratio-2.0`	Streaming 2x video super-resolution with the stable sparse-attention preset.
`flashvsr-v1.1-sparse-ratio-1.5`	Streaming 2x video super-resolution with the faster sparse-attention preset.
`flashvsr-v1.1-full-attn`	Dense full-attention preset with multi-GPU context-parallel support.

To inspect all supported CLI arguments and their default values, run:

uv run --project integrations/flashvsr \
    flashdreams-run \
    flashvsr-v1.1-sparse-ratio-2.0 \
    --help

A generated sample from the above commands:

FlashVSR 2x output (1280x768) from flashvsr-v1.1-sparse-ratio-2.0; low-resolution input (672x384) inset at bottom-left. Input from the FlashVSR examples.

Profiling benchmark#

Here is the profiling benchmark on per-chunk 2x upsampling time for FlashDreams FlashVSR compared to the official FlashVSR implementation under matched settings.

This chart shows per-chunk 2x upsampling time in milliseconds on a single GB200 GPU with a chunk size of 8 frames. For the official FlashVSR implementation, see this instruction.

Citation#

If you use FlashVSR, please cite the original work:

@inproceedings{zhuang2026flashvsr,
  title={FlashVSR: Towards Real-time Diffusion-Based Streaming Video Super Resolution},
  author={Zhuang, Junhao and Guo, Shi and Cai, Xin and Li, Xiaohui and Liu, Yihao and Yuan, Chun and Xue, Tianfan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={43482--43493},
  year={2026}
}