Introduction

The accvlab.on_demand_video_decoder package provides hardware-accelerated GPU on-demand decoding capabilities for NVIDIA GPUs.

The package internally leverages the Video Codec SDK’s core C/C++ video decode APIs and provides user-friendly Python interfaces. The package offers efficient and convenient methods for video frame extraction.

accvlab.on_demand_video_decoder Overview

Target Use Cases

This package is specifically designed for scenarios that demand high video decoding throughput, including:

Autonomous Driving: Process large volumes of video data for training perception models
Multimodal Large Language Models (MLLMs): Efficiently extract video frames for vision-language training
Video Understanding: Enable high-throughput video analysis for inference and training workloads
Additional Scenarios Sensitive to Video-Decode Throughput

Key Benefits

1. Massive Storage Savings

Traditional workflows require extracting video frames and storing them as individual images on disk before training. This package eliminates this step by decoding frames on-demand directly from video files, saving approximately 90% of disk storage with negligible performance overhead.

2. Reduced CPU Overhead

By offloading video decoding tasks to NVIDIA’s dedicated NVDEC hardware decoder, this package frees up CPU resources (~20%) for other critical training pipeline operations, improving overall system performance.

3. Flexible Decoding Methods

accvlab.on_demand_video_decoder provides multiple decoding modes optimized for different workload patterns:

Random Decode: Optimized for large-batch scenarios with random video selection and random frame sampling
Stream Decode: Optimized for large-batch sequential frame decoding from videos
Separate Decode: Decouples demuxer and decoder components, enabling flexible configuration of the video decoding pipeline
Demuxer-Free Decode: Optimized for direct GOP (Group of Pictures) reading scenarios, balancing latency and storage efficiency

4. Seamless Integration with PyTorch

accvlab.on_demand_video_decoder integrates efficiently with PyTorch, and detailed examples are provided to help users quickly integrate the package and boost workload performance.

Features

Functional Features

Codecs: H.264, HEVC, AV1.
Surface formats: NV12 (8 bit), YUV 4:2:0 (10 bit), YUV 4:4:4 (8 and 10 bit).
Video container formats: MP4, MOV, FLV, etc.
DLPack support to facilitate data exchange with popular DL frameworks like PyTorch and TensorRT.
Contains Python sample applications demonstrating API usage.
Flexible decode method (random, stream, demuxer-decoder separation, decoder-only)

High-Performance Features

Caching: Re-use of Demuxers, Decoders, Packets & internally used data
Map-free: Avoid memory mapping for unneeded frames
Use GPU Memory Pool: Avoid frequent memory re-allocation (re-allocate only if total needed memory increases)
Producer-Customer Model: Demuxer as producer, decoder as consumer, GOP as products.
NVDEC Pipeline: Pipeline utilizes all NVDEC units while ensuring load balancing with non-uniform GOP length

Getting Started

Please refer to the following sections to get started:

Acknowledgements

The accvlab.on_demand_video_decoder package builds upon and integrates with several key technologies:

FFmpeg: A complete, cross-platform solution for video and audio processing that provides essential multimedia framework capabilities
PyNvVideoCodec: NVIDIA’s Python bindings for video codec operations
NVIDIA Video Codec SDK: The underlying SDK that provides hardware-accelerated video encode and decode capabilities on NVIDIA GPUs

We are grateful to these projects and the open-source community for making high-performance video processing accessible.