FlashVSR

FlashVSR is a one-diffusion-step streaming diffusion framework for real-time video super-resolution (VSR). It combines a train-friendly three-stage distillation pipeline, locality-constrained sparse attention that bridges the train-test resolution gap, and a tiny conditional decoder for fast reconstruction.

FlashVSR teaser figure.

Teaser image source: FlashVSR official repository.

Requirements

  • Minimum VRAM: ~24 GB.

  • PyTorch: >= 2.9.

Installation

# from the repo root
uv sync --project integrations/flashvsr

Running the method

To run FlashVSR, provide an input video path and launch one of the registered runner slugs via flashdreams-run. For example:

uv run --project integrations/flashvsr \
    flashdreams-run \
    flashvsr-v1.1-sparse-ratio-2.0 \
    --input-path https://raw.githubusercontent.com/OpenImagingLab/FlashVSR/main/examples/WanVSR/inputs/example1.mp4 \
    --chunk-size 8

For multi-GPU inference, use the dense full-attention preset with torchrun on top of uv run flashdreams-run (taking 4 GPUs as an example):

uv run --project integrations/flashvsr \
    torchrun --nproc_per_node=4 --no-python flashdreams-run \
    flashvsr-v1.1-full-attn \
    --input-path https://raw.githubusercontent.com/OpenImagingLab/FlashVSR/main/examples/WanVSR/inputs/example1.mp4 \
    --chunk-size 8

Note

Multi-GPU is supported only by the dense flashvsr-v1.1-full-attn preset. The flashvsr-v1.1-sparse-ratio-* presets are single-GPU only because their Triton sparse-attention backend is not context-parallel aware.

We provide the following variants:

Method

Description

flashvsr-v1.1-sparse-ratio-2.0

Streaming 2x video super-resolution with the stable sparse-attention preset.

flashvsr-v1.1-sparse-ratio-1.5

Streaming 2x video super-resolution with the faster sparse-attention preset.

flashvsr-v1.1-full-attn

Dense full-attention preset with multi-GPU context-parallel support.

To inspect all supported CLI arguments and their default values, run:

uv run --project integrations/flashvsr \
    flashdreams-run \
    flashvsr-v1.1-sparse-ratio-2.0 \
    --help

A generated sample from the above commands:

FlashVSR 2x output (1280x768) from flashvsr-v1.1-sparse-ratio-2.0; low-resolution input (672x384) inset at bottom-left. Input from the FlashVSR examples.

Profiling benchmark

Here is the profiling benchmark on per-chunk 2x upsampling time for FlashDreams FlashVSR compared to the official FlashVSR implementation under matched settings.

This chart shows per-chunk 2x upsampling time in milliseconds on a single GB200 GPU with a chunk size of 8 frames. For the official FlashVSR implementation, see this instruction.

Citation

If you use FlashVSR, please cite the original work:

@inproceedings{zhuang2026flashvsr,
  title={FlashVSR: Towards Real-time Diffusion-Based Streaming Video Super Resolution},
  author={Zhuang, Junhao and Guo, Shi and Cai, Xin and Li, Xiaohui and Liu, Yihao and Yuan, Chun and Xue, Tianfan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={43482--43493},
  year={2026}
}