Sample Code Documentation

This document provides comprehensive guidance on using the sample codes in packages/on_demand_video_decoder/samples/. The samples demonstrate various decoding modes and advanced features of the accvlab.on_demand_video_decoder package.

1. Overview

The On-Demand Video Decoder package provides multiple decoding modes optimized for different use cases. This section helps you quickly locate the sample code that matches your requirements.

1.1 Sample Code Quick Reference

Note

The sample files mentioned in the tabled below are all located in the packages/on_demand_video_decoder/samples/ directory inside the ACCV-Lab repository.

Sample File	Use Case	Key APIs
SampleRandomAccess.py	Random frame sampling for training	`CreateGopDecoder()`, `DecodeN12ToRGB()`
SampleRandomAccessWithFastInit.py	Multi-clip batch processing with optimization	`GetFastInitInfo()`
SampleStreamAccess.py	Sequential frame decoding	`CreateSampleReader()`
SampleSeparationAccess.py	Demuxer/decoder separation with GOP caching	`GetGOP()`, `DecodeFromGOPRGB()`, `isCacheHit()`
SampleSeparationAccessGOPListAPI.py	Per-video GOP management with caching	`GetGOPList()`, `DecodeFromGOPListRGB()`, `isCacheHit()`
SampleDecodeFromGopFiles.py	GOP data persistence to disk	`SavePacketsToFile()`, `LoadGops()`
SampleDecodeFromGopFilesToListAPI.py	Selective GOP loading	`LoadGopsToList()`, `DecodeFromGOPListRGB()`
SampleDecodeFromGopList.py	Batch decode from multiple demux results (N demux → 1 decode)	`DecodeFromGOPListRGB()`
SampleStreamAsyncAccess.py	Async stream decoding with prefetching	`CreateSampleReader()`, `DecodeN12ToRGBAsync()`, `DecodeN12ToRGBAsyncGetBuffer()`

For details on the Key APIs, please refer to the API documentation of the corresponding functions and classes.

1.2 Choosing the Right Sample

Use this decision tree to select the appropriate sample for your use case:

Decoding Mode Selection:

If you need random frame access:
    If the input video resolution, color information, and other parameters remain unchanged:
        → Use SampleRandomAccessWithFastInit
    Otherwise:
        → Use SampleRandomAccess

If you need sequential frame decoding:
    If you need async decoding with prefetching for lower latency:
        → Use SampleStreamAsyncAccess
    Otherwise:
        → Use SampleStreamAccess

If you need to separate demuxing and decoding:
    If per-video GOP management is required (i.e., use of separate per-video GOP data):
        → Use SampleSeparationAccessGOPListAPI
    Otherwise:
        → Use SampleSeparationAccess

If you need to save GOP data to disk:
    → Use SampleDecodeFromGopFiles

If you need to batch decode from multiple separate demux operations:
    (e.g., DataLoader workers demux in parallel, main process batch decode)
    → Use SampleDecodeFromGopList

1.3 Core Concepts

Before diving into the samples, understanding these concepts will be helpful:

GOP (Group of Pictures): A sequence of video frames starting with a keyframe (I-frame). GOP structure is essential for video compression and random access.
Decoding Modes: accvlab.on_demand_video_decoder supports four primary modes:
- Random Access: Direct access to any frame without sequential decoding
- Stream Access: Optimized for sequential frame processing with caching
- Separation Access: Separate demuxing and decoding stages
- Demuxer-Free: Decode directly from pre-extracted GOP data
FastInit: An optimization technique that caches stream metadata to accelerate decoder initialization for multiple clips with similar properties.
GOP Caching: A Python-side caching mechanism that stores extracted GOP data in memory. When the same video file is requested with a frame_id that falls within an already cached GOP range, the cached data is returned directly without re-demuxing from the video file.

2. Quick Start

This section walks you through running your first sample in 5 minutes.

2.1 Running Your First Sample

The simplest example is SampleRandomAccess.py. Here’s how to run it:

Step 1: Prepare video files

Edit the file paths in the sample code (also see the Dataset Preparation section):

file_path_list = [
    "/path/to/your/video1.mp4",
    "/path/to/your/video2.mp4",
    # Add more video paths as needed
]

Step 2: Run the sample

cd packages/on_demand_video_decoder/samples
python SampleRandomAccess.py

Step 3: Verify the output

Expected output:

NVIDIA accvlab.on_demand_video_decoder - Random Access Video Decoding Sample
================================================================

Initializing NVIDIA GPU video decoder...
Decoder initialized successfully on GPU 0 with support for 6 concurrent files
Processing 6 video files from multi-camera setup

--- Iteration 1/5 ---
Target frame indices: [45, 23, 78, 12, 56, 89]
Initiating GPU decoding...
Successfully decoded 6 frames
Converting frames to PyTorch tensors...
Tensor shape: torch.Size([1, 900, 1600, 3])
Tensor dtype: torch.uint8

2.2 Understanding the Basic Code Structure

All samples follow a similar structure:

import accvlab.on_demand_video_decoder as nvc
import torch

# 1. Initialize decoder
decoder = nvc.CreateGopDecoder(
    maxfiles=6,  # Maximum concurrent files
    iGpu=0       # GPU device ID
)

# 2. Specify video files and frame IDs
file_path_list = ["/path/to/video1.mp4", "/path/to/video2.mp4"]
frame_id_list = [10, 25]  # Frame ID for each video

# 3. Decode frames
decoded_frames = decoder.DecodeN12ToRGB(
    file_path_list, 
    frame_id_list, 
    as_bgr=True  # Output BGR format
)

# 4. Convert to PyTorch tensors (optional)
tensors = [torch.as_tensor(frame) for frame in decoded_frames]

3. Decoding Modes

This section provides detailed documentation for each decoding mode with corresponding sample codes.

3.1 Random Access Decoding

Random Access mode allows direct access to any frame in a video without sequential decoding. The decoder automatically finds the GOP containing the target frame and decodes from the nearest keyframe.

3.1.1 Use Cases

Training with random frame sampling
Processing single video clips
Random switching between different videos
Non-sequential frame access patterns

3.1.2 Sample: Basic Random Access

File: packages/on_demand_video_decoder/samples/SampleRandomAccess.py

Core APIs

CreateGopDecoder(): Initialize the GOP decoder
DecodeN12ToRGB(): Decode frames to RGB/BGR format

Code Walkthrough

Initialize the decoder:

import accvlab.on_demand_video_decoder as nvc

nv_gop_dec = nvc.CreateGopDecoder(
    maxfiles=6,  # Maximum number of concurrent files
    iGpu=0       # Target GPU device ID
)

Prepare video files and frame indices:

# Multi-camera setup from nuScenes dataset (example for sequence named `n008-2018-08-30-15-16-55-0400`)
file_path_list = [
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_BACK_LEFT.mp4",
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_BACK.mp4",
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_BACK_RIGHT.mp4",
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_FRONT_LEFT.mp4",
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_FRONT.mp4",
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_FRONT_RIGHT.mp4",
]

# Random frame indices (one per video)
frame_id_list = [random.randint(0, 100) for _ in range(len(file_path_list))]

Decode frames:

decoded_frames = nv_gop_dec.DecodeN12ToRGB(
    file_path_list,  # List of video file paths
    frame_id_list,   # List of target frame indices
    True             # Output in BGR format (OpenCV compatible)
)

Convert to PyTorch tensors:

import torch

tensor_list = [torch.unsqueeze(torch.as_tensor(frame), 0) 
               for frame in decoded_frames]

Performance Characteristics

Memory usage: Scales with concurrent file count and video resolution
GPU utilization: 70-90% depending on video codec complexity
Throughput: Approximately 500-1500 FPS on modern GPUs (e.g., A100)

Running the Sample

cd packages/on_demand_video_decoder/samples
python SampleRandomAccess.py

Note: Modify the file_path_list in the code to point to your video files.

3.1.3 Sample: Random Access with FastInit

File: packages/on_demand_video_decoder/samples/SampleRandomAccessWithFastInit.py

When to Use

FastInit optimization is beneficial when:

Processing multiple video clips from the same dataset
All clips have similar properties (resolution, codec, GOP size)
Initialization latency is a bottleneck
Batch processing scenarios

Performance Improvement

FastInit can reduce decoder initialization time by 40-70% for subsequent clips after the first one.

Core APIs

GetFastInitInfo(): Extract stream metadata for fast initialization
DecodeN12ToRGB() with fastStreamInfos parameter

Code Walkthrough

Initialize decoder (one-time setup):

nv_gop_dec = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

Get fast initialization info from sample files:

# Extract metadata from first clip
sample_files = [os.path.join(path_bases[0], f) for f in os.listdir(path_bases[0])]
fast_stream_infos = nvc.GetFastInitInfo(sample_files)

Note

GetFastInitInfo() only needs to be called once for clips with similar properties.

Warmup (skip first-time hardware initialization overhead):

decoded_frames = nv_gop_dec.DecodeN12ToRGB(
    sample_files, 
    [0] * len(sample_files), 
    as_bgr=True,
    fastStreamInfos=fast_stream_infos
)

Process multiple clips with FastInit:

for clip_path in clip_paths:
    file_path_list = [os.path.join(clip_path, f) for f in os.listdir(clip_path)]
    frame_id_list = [random.randint(0, 100) for _ in range(len(file_path_list))]
    
    # Use fastStreamInfos for optimized initialization
    decoded_frames = nv_gop_dec.DecodeN12ToRGB(
        file_path_list,
        frame_id_list,
        as_bgr=True,
        fastStreamInfos=fast_stream_infos  # Reuse cached stream info
    )

Running the Sample

cd packages/on_demand_video_decoder/samples
python SampleRandomAccessWithFastInit.py

3.2 Stream Access Decoding

Stream Access mode is optimized for sequential frame processing with intelligent caching. It is particularly useful for temporal models and sequential video analysis.

3.2.1 Use Cases

Sequential frame decoding from videos
Temporal models (e.g., StreamPETR, BEVFormer)
Time-series video analysis
Scenarios where frames are accessed in order

3.2.2 Sample: Stream Access

File: packages/on_demand_video_decoder/samples/SampleStreamAccess.py

Core APIs

CreateSampleReader(): Initialize the sample reader (different from CreateGopDecoder())
DecodeN12ToRGB(): Decode frames with caching optimization

Key Difference from Random Access

Stream Access uses CreateSampleReader() instead of CreateGopDecoder(). The key advantage is the use of caching-based optimizations. There is also the ability to iterate over individual sets of video file sets, each set being accessed sequentially (with the number of sets being controlled by the num_of_set parameter).

Code Walkthrough

Initialize the sample reader:

nv_gop_dec = nvc.CreateSampleReader(
    num_of_set=1,              # Cache for this many video sets
    num_of_file=6,             # Maximum number of files per set
    iGpu=0
)

Understanding num_of_set

The num_of_set parameter controls caching behavior:

Set to 1 for simple sequential access
Set to batch_size for StreamPETR-like access patterns (iterating over the samples inside a batch, accessing the same video files in every batch_size-th call to the decoder)

Example: If batch_size==4, set num_of_set=4 to cache 4 different video clips.

Process frames sequentially:

file_path_list = [
    "/data/videos/scene_CAM_BACK_LEFT.mp4",
    "/data/videos/scene_CAM_BACK.mp4",
    # ... more files
]

# Start from frame 0
frame_id_list = [0] * len(file_path_list)

for iteration in range(num_iterations):
    # Increment frame indices (sequential access)
    frame_id_list = [fid + 7 for fid in frame_id_list]
    
    decoded_frames = nv_gop_dec.DecodeN12ToRGB(
        file_path_list,
        frame_id_list,
        True
    )

Caching Behavior

Stream Access mode caches:

Demuxer state
Decoder state
Recently accessed GOPs

This reduces overhead for sequential access patterns compared to Random Access mode.

Running the Sample

cd packages/on_demand_video_decoder/samples
python SampleStreamAccess.py

3.2.3 Sample: Async Stream Access

File: packages/on_demand_video_decoder/samples/SampleStreamAsyncAccess.py

When to Use

Async Stream Access is beneficial when:

Lower latency is required for streaming applications
Prefetching next frame while processing current frame improves latency
Labeling task model need high-performance inference
GPU utilization needs to be maximized through overlapped operations

Key Advantages Over Basic Stream Access

Feature	Stream Access	Async Stream Access
Decode mode	Synchronous	Asynchronous with prefetching
Latency	Standard	Lower (prefetched frames ready)
GPU utilization	Standard	Better (decode/process overlap)

Core APIs

CreateSampleReader(): Initialize the sample reader
DecodeN12ToRGBAsync(): Start asynchronous decoding
DecodeN12ToRGBAsyncGetBuffer(): Retrieve decoded frames from buffer

Code Walkthrough

Initialize the sample reader:

import accvlab.on_demand_video_decoder as nvc

nv_stream_dec = nvc.CreateSampleReader(
    num_of_set=1,              # Cache for this many video sets
    num_of_file=6,             # Maximum number of files per set
    iGpu=0                     # Target GPU device ID
)

Async Decoding Pattern

The async pattern consists of two main operations:

DecodeN12ToRGBAsync: Start asynchronous decoding (non-blocking)
DecodeN12ToRGBAsyncGetBuffer: Get decoded frames (waits if not ready)

First iteration - start async decode and get result:

# Start async decode
nv_stream_dec.DecodeN12ToRGBAsync(
    file_path_list,
    frame_id_list,
    False,  # Output in RGB format (False=RGB, True=BGR)
)

# Get the result (will wait for async decode to complete)
decoded_frames = nv_stream_dec.DecodeN12ToRGBAsyncGetBuffer(
    file_path_list,
    frame_id_list,
    False,  # Output in RGB format
)

Subsequent iterations - get prefetched result:

# Get prefetched result from buffer (already decoded in background)
decoded_frames = nv_stream_dec.DecodeN12ToRGBAsyncGetBuffer(
    file_path_list,
    frame_id_list,
    False,  # Output in RGB format
)

Prefetching Pattern

The key optimization is prefetching the next frame while processing the current one:

# Process current frame
tensor_list = [torch.as_tensor(frame, device='cuda') for frame in decoded_frames]
rgb_batch = torch.stack(tensor_list, dim=0)

# Prefetch next frame (non-blocking, happens in background)
if idx < len(frames_to_decode) - 1:
    next_frame = frames_to_decode[idx + 1]
    next_frame_id_list = [next_frame] * len(file_path_list)
    nv_stream_dec.DecodeN12ToRGBAsync(
        file_path_list,
        next_frame_id_list,
        False,
    )

# Continue processing current frame...
# Next iteration will get prefetched frame immediately

Important: Zero-Copy Frame Management

⚠️ Warning: The decoded frames returned by DecodeN12ToRGBAsyncGetBuffer are zero-copy references to internal buffers. You must deep copy the frames before calling DecodeN12ToRGBAsync again, otherwise the data will be overwritten.

# CORRECT: Deep copy frames before next async call
tensor_list = [torch.as_tensor(frame, device='cuda').clone() for frame in decoded_frames]
# or
rgb_batch = torch.stack([torch.as_tensor(frame, device='cuda') for frame in decoded_frames], dim=0)

# Now safe to call DecodeN12ToRGBAsync for next frame
nv_stream_dec.DecodeN12ToRGBAsync(...)

Complete Async Workflow

Iteration 1:
  DecodeN12ToRGBAsync(frame_0)     → Start decode
  DecodeN12ToRGBAsyncGetBuffer()   → Wait & get frame_0
  Process frame_0
  DecodeN12ToRGBAsync(frame_1)     → Prefetch frame_1

Iteration 2:
  DecodeN12ToRGBAsyncGetBuffer()   → Get prefetched frame_1 (fast!)
  Process frame_1
  DecodeN12ToRGBAsync(frame_2)     → Prefetch frame_2

Iteration N:
  DecodeN12ToRGBAsyncGetBuffer()   → Get prefetched frame_N
  Process frame_N
  (No prefetch for last frame)

Running the Sample

cd packages/on_demand_video_decoder/samples
python SampleStreamAsyncAccess.py

3.3 Separation Access Decoding

Separation Access mode decouples demuxing and decoding into two separate stages. This provides fine-grained control over the video processing pipeline and enables advanced optimization strategies.

3.3.1 Use Cases

Need separate control over demuxing and decoding
One-time demuxing, multiple decoding operations
Inspection or processing of intermediate packet data
Custom processing pipelines

3.3.2 Two-Stage Architecture

Stage 1 (Demuxing):
Video File → GetGOP() → Packet Data (GOP)
                         ├─ packets
                         ├─ first_frame_ids
                         └─ gop_lens

Stage 2 (Decoding):
Packet Data → DecodeFromGOPRGB() → Decoded Frames

3.3.3 Sample: Basic Separation Access

File: packages/on_demand_video_decoder/samples/SampleSeparationAccess.py

Core APIs

GetGOP(): Extract packet data (demuxing only)
DecodeFromGOPRGB(): Decode from packet data (decoding only)

Code Walkthrough

Initialize two separate decoders:

# Stage 1 decoder: for packet extraction
nv_gop_dec1 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

# Stage 2 decoder: for packet decoding
nv_gop_dec2 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

Note

Using separate decoder instances allows independent configuration and resource management.

Stage 1 - Extract packet data:

file_path_list = [
    "/data/videos/scene_CAM_BACK_LEFT.mp4",
    "/data/videos/scene_CAM_BACK.mp4",
    # ... more files
]

# Extract GOP data containing frame 77 for all videos
packets, first_frame_ids, gop_lens = nv_gop_dec1.GetGOP(
    file_path_list,
    [77] * len(file_path_list)
)

Understanding the return values:

packets: Compressed packet data (numpy array)
first_frame_ids: First frame ID in each extracted GOP
gop_lens: Number of frames in each GOP

Stage 2 - Decode from packet data:

# Generate frame IDs within the GOP range
frame_id_list = [
    random.randint(first_frame_ids[i], first_frame_ids[i] + gop_lens[i] - 1)
    for i in range(len(file_path_list))
]

# Decode frames directly from packet data
decoded_frames = nv_gop_dec2.DecodeFromGOPRGB(
    packets,           # Packet data from Stage 1
    file_path_list,    # Original file paths (for reference)
    frame_id_list,     # Target frame indices
    True               # BGR output
)

Validation

Always validate that frame IDs are within GOP range:

if frame_id < first_frame_ids[i] or frame_id >= first_frame_ids[i] + gop_lens[i]:
    print(f"Frame {frame_id} is out of range for GOP starting at {first_frame_ids[i]}")

Advantages of Separation

Demux once, decode multiple times with different frame selections
Ability to inspect or process packet data
Separate optimization of demuxing and decoding stages
Foundation for more advanced processing pipelines

Running the Sample

cd packages/on_demand_video_decoder/samples
python SampleSeparationAccess.py

3.3.4 Sample: Separation Access with GetGOPList API

File: packages/on_demand_video_decoder/samples/SampleSeparationAccessGOPListAPI.py

When to Use

GetGOPList() is preferred over GetGOP() when:

Processing large video collections
Per-video cache management is needed
Selective video loading is required
Distributed storage and processing

Core Difference: GetGOP() vs GetGOPList()

Feature	`GetGOP()`	`GetGOPList()`
Return type	Single merged bundle	List of per-video bundles
Data structure	`(packets, ids, lens)`	`[(packets1, ids1, lens1), (packets2, ids2, lens2), ...]`
Memory management	Load all or nothing	Load selectively
Decoding API	DecodeFromGOPRGB	DecodeFromGOPListRGB
Best for	Batch processing all videos	Per-video management

Core APIs

GetGOPList(): Extract packet data per video (not merged)
DecodeFromGOPListRGB(): Decode from list of packet data

Code Walkthrough

Stage 1 - Extract per-video GOP data:

file_path_list = [
    "/data/videos/CAM_BACK_LEFT.mp4",
    "/data/videos/CAM_BACK.mp4",
    "/data/videos/CAM_BACK_RIGHT.mp4",
    "/data/videos/CAM_FRONT_LEFT.mp4",
    "/data/videos/CAM_FRONT.mp4",
    "/data/videos/CAM_FRONT_RIGHT.mp4",
]

# Extract GOP data, returns list of tuples
gop_list = nv_gop_dec1.GetGOPList(
    file_path_list,
    [77] * len(file_path_list)
)

# gop_list structure:
# [
#   (packets_video1, first_frame_ids_video1, gop_lens_video1),
#   (packets_video2, first_frame_ids_video2, gop_lens_video2),
#   ...
# ]

Per-video GOP data inspection:

for i, (gop_data, first_frame_ids, gop_lens) in enumerate(gop_list):
    print(f"Video {i}:")
    print(f"  GOP data size: {len(gop_data)} bytes")
    print(f"  First frame ID: {first_frame_ids[0]}")
    print(f"  GOP length: {gop_lens[0]}")

Simulating per-video caching:

# Cache GOP data per video
gop_cache = {}
for i, (gop_data, first_frame_ids, gop_lens) in enumerate(gop_list):
    cache_key = f"video_{i}_frame_77"
    gop_cache[cache_key] = {
        'gop_data': gop_data,
        'first_frame_ids': first_frame_ids,
        'gop_lens': gop_lens,
        'filepath': file_path_list[i]
    }

Stage 2 - Selective decoding:

# Select only specific videos to decode (e.g., front cameras only)
selected_indices = [3, 4, 5]  # Front-left, front, front-right

selected_gop_data_list = []
selected_filepaths = []
selected_frame_ids = []

for idx in selected_indices:
    cache_key = f"video_{idx}_frame_77"
    cached_item = gop_cache[cache_key]
    
    # Generate random frame within GOP range
    first_frame_id = cached_item['first_frame_ids'][0]
    gop_len = cached_item['gop_lens'][0]
    random_frame = random.randint(first_frame_id, first_frame_id + gop_len - 1)
    
    selected_gop_data_list.append(cached_item['gop_data'])
    selected_filepaths.append(cached_item['filepath'])
    selected_frame_ids.append(random_frame)

# Decode only selected videos
decoded_frames = nv_gop_dec2.DecodeFromGOPListRGB(
    selected_gop_data_list,  # List of GOP data for selected videos
    selected_filepaths,      # Corresponding file paths
    selected_frame_ids,      # Frame IDs to decode
    True                     # BGR output
)

Key Advantages

Load only required videos from cache (memory efficient)
Per-video cache management (independent expiration, priority)
Better suited for distributed systems
Reduced inter-video dependencies

Running the Sample

cd packages/on_demand_video_decoder/samples
python SampleSeparationAccessGOPListAPI.py

3.3.5 GOP Caching Feature

The GOP caching feature automatically stores extracted GOP data in Python memory, eliminating the need for manual cache management by the user. When enabled, subsequent calls to GetGOP() or GetGOPList() with the same video file and a frame_id within the cached GOP range will return cached data without re-demuxing.

Why Use GOP Caching?

In training scenarios, especially with video datasets:

The same video file may be accessed multiple times with different frame indices
Multiple frame indices often fall within the same GOP (Group of Pictures)
Re-demuxing for each access wastes I/O and CPU resources

Without caching, users would need to manually track GOP ranges and manage cache dictionaries. With the useGOPCache parameter, this is handled automatically.

Enabling GOP Caching

Set useGOPCache=True when calling GetGOP() or GetGOPList():

import accvlab.on_demand_video_decoder as nvc

decoder = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

# First call - fetches GOP data from video files
packets, first_ids, gop_lens = decoder.GetGOP(
    file_path_list, 
    [77] * len(file_path_list), 
    useGOPCache=True
)

# Second call with frame_id=80 (within the same GOP range) - returns from cache
packets, first_ids, gop_lens = decoder.GetGOP(
    file_path_list, 
    [80] * len(file_path_list), 
    useGOPCache=True
)

Cache Hit Condition

A cache hit occurs when:

The requested filepath matches a cached entry
The requested frame_id satisfies: first_frame_id <= frame_id < first_frame_id + gop_len

If the frame_id is outside the cached GOP range, a new GOP is fetched and the cache is updated.

Checking Cache Hit Status

Use the isCacheHit() method to check whether the last GetGOP() or GetGOPList() call hit the cache:

# Call GetGOP with caching
packets, first_ids, gop_lens = decoder.GetGOP(file_path_list, frame_ids, useGOPCache=True)

# Check cache hit status for each video
cache_hits = decoder.isCacheHit()
print(cache_hits)  # [True, False, True, True, False] - per-video cache hit status

The return value is a list of booleans, one for each video in the request, indicating whether the cached data was used (True) or new data was fetched (False).

Cache Management Methods

The decoder provides methods to manage the cache:

Method	Description
`get_cache_info()`	Returns a dictionary with cache statistics
`clear_cache()`	Clears all cached GOP data

Example:

# Get cache information
cache_info = decoder.get_cache_info()
print(f"Cached files: {cache_info['cached_files_count']}")
print(f"File paths: {cache_info['cached_files']}")

# Clear all cache when done
decoder.clear_cache()

GOP Caching with GetGOPList

The caching feature works identically with GetGOPList():

# First call - all videos are fetched
gop_list = decoder.GetGOPList(file_path_list, [77, 77, 77], useGOPCache=True)
print(decoder.isCacheHit())  # [False, False, False]

# Second call with some frame_ids in range, some out of range
gop_list = decoder.GetGOPList(file_path_list, [80, 80, 150], useGOPCache=True)
print(decoder.isCacheHit())  # [True, True, False] - partial cache hit

Shared Cache Between GetGOP and GetGOPList

The cache is shared between GetGOP() and GetGOPList() calls on the same decoder instance:

# Cache populated via GetGOP
packets, _, _ = decoder.GetGOP(["/path/to/video.mp4"], [50], useGOPCache=True)

# Cache hit via GetGOPList (same file, frame_id in range)
gop_list = decoder.GetGOPList(["/path/to/video.mp4"], [55], useGOPCache=True)
print(decoder.isCacheHit())  # [True]

⚠️ Note: The cache is stored in Python memory. Each video file caches only one GOP (the most recently accessed). For long-running processes with many different videos, use clear_cache() to release memory when needed.

When to Use GOP Caching

✓ Training loops with random frame sampling from the same video ✓ Multi-camera setups where cameras are often accessed with similar frame indices ✓ Scenarios where the same GOP is likely to be accessed multiple times ✓ Reducing I/O overhead in data loading pipelines

✗ One-time video processing (no repeated access) ✗ Memory-constrained environments with large video collections ✗ Scenarios where each frame access targets a different GOP

3.4 Demuxer-Free Decoding

Demuxer-Free mode allows decoding directly from pre-extracted GOP data, either stored on disk or in memory. This approach is ideal for scenarios requiring repeated access to the same video segments.

3.4.1 Use Cases

Pre-processing video datasets for training
Repeated access to same video segments
Disk storage for GOP data caching
Eliminating demuxing overhead in production
PyTorch DataLoader integration with worker processes

3.4.2 Sample: GOP File Storage and Decoding

File: packages/on_demand_video_decoder/samples/SampleDecodeFromGopFiles.py

Two-Phase Workflow

Phase 1: GOP Data Preparation
Video Files → GetGOP() → SavePacketsToFile() → .bin files on disk

Phase 2: Decoding from Files
.bin files → LoadGops() → DecodeFromGOPRGB() → Decoded Frames

Core APIs

GetGOP(): Extract GOP packet data
SavePacketsToFile(): Save packets to binary file
LoadGops(): Load packets from binary files (merged)
DecodeFromGOPRGB(): Decode from loaded packets

Code Walkthrough

Initialize decoders:

# Decoder for packet extraction
nv_gop_dec1 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

# Decoder for GOP file decoding
nv_gop_dec2 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

Phase 1 - Extract and save GOP data:

file_list = [
    "/data/videos/CAM_BACK_LEFT.mp4",
    "/data/videos/CAM_BACK.mp4",
    # ... more files
]

frames = [random.randint(0, 200) for _ in range(len(file_list))]
packet_files = []

for i in range(len(file_list)):
    # Extract packet data for single file
    numpy_data, first_frame_ids, gop_lens = nv_gop_dec1.GetGOP(
        file_list[i:i+1],
        frames[i:i+1]
    )
    
    # Save to binary file
    packet_file = f"./gop_packets_{i:02d}.bin"
    nvc.SavePacketsToFile(numpy_data, packet_file)
    packet_files.append(packet_file)
    
    print(f"Saved GOP data: {os.path.getsize(packet_file)} bytes")

Phase 2 - Load and decode from GOP files:

# Load stored GOP data
merged_numpy_data = nv_gop_dec2.LoadGops(packet_files)

print(f"Loaded GOP data: {merged_numpy_data.size} bytes")

# Decode frames from loaded data
decoded_frames = nv_gop_dec2.DecodeFromGOPRGB(
    merged_numpy_data,  # Merged packet data from LoadGops
    file_list,          # Original video file paths
    frames,             # Target frame indices
    as_bgr=True
)

Cleanup temporary files:

for packet_file in packet_files:
    if os.path.exists(packet_file):
        os.remove(packet_file)

File Format

GOP files are binary files containing raw packet data. The format is:

Binary format (no header)
Direct memory dump of packet data
File extension: .bin (recommended)

Storage Considerations

GOP file size: Typically 5-15% of original video size
Storage savings: ~85-95% compared to extracted frames
I/O performance: SSD recommended for best performance

When to Use

Use GOP file storage when:

Same video segments accessed repeatedly
Training multiple epochs on the same dataset
Storage is cheaper than compute
Want to eliminate demuxing overhead

Running the Sample

cd packages/on_demand_video_decoder/samples
python SampleDecodeFromGopFiles.py

3.4.3 Sample: GOP File List API

File: packages/on_demand_video_decoder/samples/SampleDecodeFromGopFilesToListAPI.py

When to Use

LoadGopsToList() is preferred over LoadGops() when:

Large video collections (>10 videos)
Need selective loading of specific videos
Per-video cache management
Distributed caching systems

Core Difference: LoadGops() vs LoadGopsToList()

Feature	`LoadGops()`	`LoadGopsToList()`
Return type	Single merged numpy array	List of numpy arrays (one per video)
Loading	All or nothing	Selective loading possible
Memory usage	Load all GOP data at once	Load only needed videos
Decoding API	DecodeFromGOPRGB	DecodeFromGOPListRGB
Best for	Small video sets	Large video collections

Core APIs

SavePacketsToFile(): Save per-video GOP data
LoadGopsToList(): Load GOP files as list (not merged)
DecodeFromGOPListRGB(): Decode from list of GOP data

Code Walkthrough

Phase 1 - Save per-video GOP files:

file_list = [
    "/data/videos/CAM_BACK_LEFT.mp4",
    "/data/videos/CAM_BACK.mp4",
    "/data/videos/CAM_BACK_RIGHT.mp4",
    "/data/videos/CAM_FRONT_LEFT.mp4",
    "/data/videos/CAM_FRONT.mp4",
    "/data/videos/CAM_FRONT_RIGHT.mp4",
]

camera_names = ["CAM_BACK_LEFT", "CAM_BACK", "CAM_BACK_RIGHT",
                "CAM_FRONT_LEFT", "CAM_FRONT", "CAM_FRONT_RIGHT"]

packet_files = []
frames = [random.randint(0, 200) for _ in range(len(file_list))]

for i in range(len(file_list)):
    # Extract GOP data for single video
    numpy_data, first_frame_ids, gop_lens = nv_gop_dec1.GetGOP(
        file_list[i:i+1],
        frames[i:i+1]
    )
    
    # Create unique filename per video
    packet_file = f"./gop_{camera_names[i]}.bin"
    nvc.SavePacketsToFile(numpy_data, packet_file)
    packet_files.append(packet_file)

Phase 2 - Load all GOP files as list:

# Load GOP files as separate bundles (not merged)
gop_data_list = nv_gop_dec2.LoadGopsToList(packet_files)

# gop_data_list is a list of numpy arrays, one per video
print(f"Loaded {len(gop_data_list)} GOP bundles")
for i, gop_data in enumerate(gop_data_list):
    print(f"  Bundle {i} ({camera_names[i]}): {len(gop_data)} bytes")

Decode from GOP list:

# Decode all videos
decoded_frames = nv_gop_dec2.DecodeFromGOPListRGB(
    gop_data_list,  # List of GOP data
    file_list,      # List of file paths
    frames,         # List of frame IDs
    as_bgr=True
)

Phase 3 - Selective loading demonstration:

# Select only front cameras (indices 3, 4, 5)
selected_indices = [3, 4, 5]
selected_files = [packet_files[i] for i in selected_indices]
selected_video_paths = [file_list[i] for i in selected_indices]
selected_frames = [frames[i] for i in selected_indices]

# Load only selected GOP files
selected_gop_list = nv_gop_dec2.LoadGopsToList(selected_files)

# Decode only selected videos
decoded_frames = nv_gop_dec2.DecodeFromGOPListRGB(
    selected_gop_list,
    selected_video_paths,
    selected_frames,
    as_bgr=True
)

print(f"Loaded and decoded only {len(selected_indices)} out of {len(packet_files)} videos")

Key Advantages

Memory efficiency: Load only needed videos
Flexible loading: Different subsets for different batches
Distributed caching: Store videos on different machines
Per-video cache management: Independent expiration policies

Running the Sample

cd packages/on_demand_video_decoder/samples
python SampleDecodeFromGopFilesToListAPI.py

3.4.4 Sample: Batch Decode from Multiple Demux Results

File: packages/on_demand_video_decoder/samples/SampleDecodeFromGopList.py

When to Use

This sample demonstrates the pattern of multiple demuxing operations followed by a single batch decode:

Demux executed N times separately (e.g., in DataLoader __getitem__, called batch_size times)
Decode executed once for the entire batch
Enables parallel demuxing in worker processes, centralized batch decoding in main process
No disk I/O for GOP data (in-memory packet passing)

Architecture: N Demux → 1 Batch Decode

Worker/Process 1: Video File 1 → GetGOP() → packets_1 (in memory)
Worker/Process 2: Video File 2 → GetGOP() → packets_2 (in memory)
Worker/Process 3: Video File 3 → GetGOP() → packets_3 (in memory)
                     ⋮                            ⋮
Worker/Process N: Video File N → GetGOP() → packets_N (in memory)
                                                      ↓
                          Collect all packets: [packets_1, packets_2, ..., packets_N]
                                                      ↓
                  Main Process: DecodeFromGOPListRGB() → Batch of N Frames (single decode call)

Core Concept

Multiple separate demuxing operations → Single batch decoding operation

Core APIs

GetGOP(): Extract packets (called N times, possibly in parallel)
DecodeFromGOPListRGB(): Batch decode from list of packets (called once for entire batch)

Code Walkthrough

Initialize decoders:

# Worker decoder (simulated): for packet extraction
nv_gop_dec1 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

# Main process decoder: for batch decoding
nv_gop_dec2 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

Phase 1 - Multiple demux operations (simulating parallel workers):

file_list = [
    "/data/videos/CAM_BACK_LEFT.mp4",
    "/data/videos/CAM_BACK.mp4",
    # ... more files
]

frames = [random.randint(0, 200) for _ in range(len(file_list))]

# Demux executed N times (e.g., in DataLoader __getitem__, called batch_size times)
packets_list = []

for i in range(len(file_list)):
    # Each demux operation extracts packets for one video
    numpy_data, first_frame_ids, gop_lens = nv_gop_dec1.GetGOP(
        file_list[i:i+1],
        frames[i:i+1]
    )
    packets_list.append(numpy_data)
    print(f"Demux {i+1}: Extracted {numpy_data.size} bytes")

Phase 2 - Single batch decode (in main process):

# Decode executed once for all N demux results
decoded_frames = nv_gop_dec2.DecodeFromGOPListRGB(
    packets_list,  # List of N packet data from multiple demux operations
    file_list,     # Original file paths
    frames,        # Target frame IDs
    as_bgr=True
)

print(f"Batch decode: {len(decoded_frames)} frames decoded in one call")

DataLoader Integration Pattern

In a real PyTorch DataLoader:

# In worker process (worker_fn)
def worker_fn(video_path, frame_id):
    packets, first_ids, gop_lens = decoder.GetGOP([video_path], [frame_id])
    return packets

# In main process collate_fn
def collate_fn(batch):
    packets_list = [item['packets'] for item in batch]
    file_paths = [item['file_path'] for item in batch]
    frame_ids = [item['frame_id'] for item in batch]
    
    # Batch decode in main process
    frames = decoder.DecodeFromGOPListRGB(packets_list, file_paths, frame_ids, True)
    return frames

Key Benefits

Parallel demuxing: Each worker demuxes independently in parallel
Single batch decode: GPU decoder called only once for entire batch (efficient GPU utilization)
No disk I/O: Packets passed in memory, no temporary file storage
Resource separation: CPU-heavy demuxing in workers, GPU decoding in main process

Memory Management

Keep packet data lifetime short (decode and release)
Monitor memory usage in worker processes
Balance worker count with available memory

Running the Sample

cd packages/on_demand_video_decoder/samples
python SampleDecodeFromGopList.py