# Sample Code Documentation

This document provides comprehensive guidance on using the sample codes in 
`packages/on_demand_video_decoder/samples/`. The samples demonstrate various decoding modes and advanced 
features of the `accvlab.on_demand_video_decoder` package.

## 1. Overview

The On-Demand Video Decoder package provides multiple decoding modes optimized for different use cases. This 
section helps you quickly locate the sample code that matches your requirements.

### 1.1 Sample Code Quick Reference

> **ℹ️ Note**: The sample files mentioned in the tabled below are all located in the 
> `packages/on_demand_video_decoder/samples/` directory inside the ACCV-Lab repository.

| Sample File | Use Case | Key APIs |
|------------|----------|----------|
| [SampleRandomAccess.py](../samples/SampleRandomAccess.py) | Random frame sampling for training | {py:func}`~accvlab.on_demand_video_decoder.CreateGopDecoder`, {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeN12ToRGB` |
| [SampleRandomAccessWithFastInit.py](../samples/SampleRandomAccessWithFastInit.py) | Multi-clip batch processing with optimization | {py:func}`~accvlab.on_demand_video_decoder.GetFastInitInfo` |
| [SampleStreamAccess.py](../samples/SampleStreamAccess.py) | Sequential frame decoding | {py:func}`~accvlab.on_demand_video_decoder.CreateSampleReader` |
| [SampleSeparationAccess.py](../samples/SampleSeparationAccess.py) | Demuxer/decoder separation with GOP caching | {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOP`, {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeFromGOPRGB`, {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.isCacheHit` |
| [SampleSeparationAccessGOPListAPI.py](../samples/SampleSeparationAccessGOPListAPI.py) | Per-video GOP management with caching | {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOPList`, {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeFromGOPListRGB`, {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.isCacheHit` |
| [SampleDecodeFromGopFiles.py](../samples/SampleDecodeFromGopFiles.py) | GOP data persistence to disk | {py:func}`~accvlab.on_demand_video_decoder.SavePacketsToFile`, {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.LoadGops` |
| [SampleDecodeFromGopFilesToListAPI.py](../samples/SampleDecodeFromGopFilesToListAPI.py) | Selective GOP loading | {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.LoadGopsToList`, {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeFromGOPListRGB` |
| [SampleDecodeFromGopList.py](../samples/SampleDecodeFromGopList.py) | Batch decode from multiple demux results (N demux → 1 decode) | {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeFromGOPListRGB` |
| [SampleStreamAsyncAccess.py](../samples/SampleStreamAsyncAccess.py) | Async stream decoding with prefetching | {py:func}`~accvlab.on_demand_video_decoder.CreateSampleReader`, {py:meth}`~accvlab.on_demand_video_decoder.PyNvSampleReader.DecodeN12ToRGBAsync`, {py:meth}`~accvlab.on_demand_video_decoder.PyNvSampleReader.DecodeN12ToRGBAsyncGetBuffer` |

For details on the **Key APIs**, please refer to the API documentation of the corresponding functions and classes.

### 1.2 Choosing the Right Sample

Use this decision tree to select the appropriate sample for your use case:

```
Decoding Mode Selection:

If you need random frame access:
    If the input video resolution, color information, and other parameters remain unchanged:
        → Use SampleRandomAccessWithFastInit
    Otherwise:
        → Use SampleRandomAccess

If you need sequential frame decoding:
    If you need async decoding with prefetching for lower latency:
        → Use SampleStreamAsyncAccess
    Otherwise:
        → Use SampleStreamAccess

If you need to separate demuxing and decoding:
    If per-video GOP management is required (i.e., use of separate per-video GOP data):
        → Use SampleSeparationAccessGOPListAPI
    Otherwise:
        → Use SampleSeparationAccess

If you need to save GOP data to disk:
    → Use SampleDecodeFromGopFiles

If you need to batch decode from multiple separate demux operations:
    (e.g., DataLoader workers demux in parallel, main process batch decode)
    → Use SampleDecodeFromGopList
```

### 1.3 Core Concepts

Before diving into the samples, understanding these concepts will be helpful:

- **GOP (Group of Pictures)**: A sequence of video frames starting with a keyframe (I-frame). GOP structure is essential for video compression and random access.

- **Decoding Modes**: `accvlab.on_demand_video_decoder` supports four primary modes:
  - **Random Access**: Direct access to any frame without sequential decoding
  - **Stream Access**: Optimized for sequential frame processing with caching
  - **Separation Access**: Separate demuxing and decoding stages
  - **Demuxer-Free**: Decode directly from pre-extracted GOP data

- **FastInit**: An optimization technique that caches stream metadata to accelerate decoder initialization for multiple clips with similar properties.

- **GOP Caching**: A Python-side caching mechanism that stores extracted GOP data in memory. When the same video file is requested with a `frame_id` that falls within an already cached GOP range, the cached data is returned directly without re-demuxing from the video file.

## 2. Quick Start

This section walks you through running your first sample in 5 minutes.

### 2.1 Running Your First Sample

The simplest example is [SampleRandomAccess.py](../samples/SampleRandomAccess.py). Here's how to run it:

**Step 1: Prepare video files**

Edit the file paths in the sample code (also see the [Dataset Preparation](dataset_preparation.md) section):

```python
file_path_list = [
    "/path/to/your/video1.mp4",
    "/path/to/your/video2.mp4",
    # Add more video paths as needed
]
```

**Step 2: Run the sample**

```bash
cd packages/on_demand_video_decoder/samples
python SampleRandomAccess.py
```

**Step 3: Verify the output**

Expected output:

```text
NVIDIA accvlab.on_demand_video_decoder - Random Access Video Decoding Sample
================================================================

Initializing NVIDIA GPU video decoder...
Decoder initialized successfully on GPU 0 with support for 6 concurrent files
Processing 6 video files from multi-camera setup

--- Iteration 1/5 ---
Target frame indices: [45, 23, 78, 12, 56, 89]
Initiating GPU decoding...
Successfully decoded 6 frames
Converting frames to PyTorch tensors...
Tensor shape: torch.Size([1, 900, 1600, 3])
Tensor dtype: torch.uint8
```

### 2.2 Understanding the Basic Code Structure

All samples follow a similar structure:

```python
import accvlab.on_demand_video_decoder as nvc
import torch

# 1. Initialize decoder
decoder = nvc.CreateGopDecoder(
    maxfiles=6,  # Maximum concurrent files
    iGpu=0       # GPU device ID
)

# 2. Specify video files and frame IDs
file_path_list = ["/path/to/video1.mp4", "/path/to/video2.mp4"]
frame_id_list = [10, 25]  # Frame ID for each video

# 3. Decode frames
decoded_frames = decoder.DecodeN12ToRGB(
    file_path_list, 
    frame_id_list, 
    as_bgr=True  # Output BGR format
)

# 4. Convert to PyTorch tensors (optional)
tensors = [torch.as_tensor(frame) for frame in decoded_frames]
```

## 3. Decoding Modes

This section provides detailed documentation for each decoding mode with corresponding sample codes.

### 3.1 Random Access Decoding

Random Access mode allows direct access to any frame in a video without sequential decoding. The decoder automatically finds the GOP containing the target frame and decodes from the nearest keyframe.

#### 3.1.1 Use Cases

- Training with random frame sampling
- Processing single video clips
- Random switching between different videos
- Non-sequential frame access patterns

#### 3.1.2 Sample: Basic Random Access

**File:** `packages/on_demand_video_decoder/samples/SampleRandomAccess.py`

**Core APIs**

- {py:func}`~accvlab.on_demand_video_decoder.CreateGopDecoder`: Initialize the GOP decoder
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeN12ToRGB`: Decode frames to RGB/BGR format

**Code Walkthrough**

Initialize the decoder:

```python
import accvlab.on_demand_video_decoder as nvc

nv_gop_dec = nvc.CreateGopDecoder(
    maxfiles=6,  # Maximum number of concurrent files
    iGpu=0       # Target GPU device ID
)
```

Prepare video files and frame indices:

```python
# Multi-camera setup from nuScenes dataset (example for sequence named `n008-2018-08-30-15-16-55-0400`)
file_path_list = [
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_BACK_LEFT.mp4",
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_BACK.mp4",
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_BACK_RIGHT.mp4",
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_FRONT_LEFT.mp4",
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_FRONT.mp4",
    "/data/nuscenes/video_samples/n008-2018-08-30-15-16-55-0400/CAM_FRONT_RIGHT.mp4",
]

# Random frame indices (one per video)
frame_id_list = [random.randint(0, 100) for _ in range(len(file_path_list))]
```

Decode frames:

```python
decoded_frames = nv_gop_dec.DecodeN12ToRGB(
    file_path_list,  # List of video file paths
    frame_id_list,   # List of target frame indices
    True             # Output in BGR format (OpenCV compatible)
)
```

Convert to PyTorch tensors:

```python
import torch

tensor_list = [torch.unsqueeze(torch.as_tensor(frame), 0) 
               for frame in decoded_frames]
```

**Performance Characteristics**

- Memory usage: Scales with concurrent file count and video resolution
- GPU utilization: 70-90% depending on video codec complexity
- Throughput: Approximately 500-1500 FPS on modern GPUs (e.g., A100)

**Running the Sample**

```bash
cd packages/on_demand_video_decoder/samples
python SampleRandomAccess.py
```

Note: Modify the `file_path_list` in the code to point to your video files.

#### 3.1.3 Sample: Random Access with FastInit

**File:** `packages/on_demand_video_decoder/samples/SampleRandomAccessWithFastInit.py`

**When to Use**

FastInit optimization is beneficial when:
- Processing multiple video clips from the same dataset
- All clips have similar properties (resolution, codec, GOP size)
- Initialization latency is a bottleneck
- Batch processing scenarios

**Performance Improvement**

FastInit can reduce decoder initialization time by 40-70% for subsequent clips after the first one.

**Core APIs**

- {py:func}`~accvlab.on_demand_video_decoder.GetFastInitInfo`: Extract stream metadata for fast initialization
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeN12ToRGB` with `fastStreamInfos` parameter

**Code Walkthrough**

Initialize decoder (one-time setup):

```python
nv_gop_dec = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)
```

Get fast initialization info from sample files:

```python
# Extract metadata from first clip
sample_files = [os.path.join(path_bases[0], f) for f in os.listdir(path_bases[0])]
fast_stream_infos = nvc.GetFastInitInfo(sample_files)
```

> **ℹ️ Note**: {py:func}`~accvlab.on_demand_video_decoder.GetFastInitInfo` only needs to be called once for 
> clips with similar properties.

Warmup (skip first-time hardware initialization overhead):

```python
decoded_frames = nv_gop_dec.DecodeN12ToRGB(
    sample_files, 
    [0] * len(sample_files), 
    as_bgr=True,
    fastStreamInfos=fast_stream_infos
)
```

Process multiple clips with FastInit:

```python
for clip_path in clip_paths:
    file_path_list = [os.path.join(clip_path, f) for f in os.listdir(clip_path)]
    frame_id_list = [random.randint(0, 100) for _ in range(len(file_path_list))]
    
    # Use fastStreamInfos for optimized initialization
    decoded_frames = nv_gop_dec.DecodeN12ToRGB(
        file_path_list,
        frame_id_list,
        as_bgr=True,
        fastStreamInfos=fast_stream_infos  # Reuse cached stream info
    )
```

**Running the Sample**

```bash
cd packages/on_demand_video_decoder/samples
python SampleRandomAccessWithFastInit.py
```

### 3.2 Stream Access Decoding

Stream Access mode is optimized for sequential frame processing with intelligent caching. It is particularly useful for temporal models and sequential video analysis.

#### 3.2.1 Use Cases

- Sequential frame decoding from videos
- Temporal models (e.g., StreamPETR, BEVFormer)
- Time-series video analysis
- Scenarios where frames are accessed in order

#### 3.2.2 Sample: Stream Access

**File:** `packages/on_demand_video_decoder/samples/SampleStreamAccess.py`

**Core APIs**

- {py:func}`~accvlab.on_demand_video_decoder.CreateSampleReader`: Initialize the sample reader (different from 
  {py:func}`~accvlab.on_demand_video_decoder.CreateGopDecoder`)
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeN12ToRGB`: Decode frames with caching 
  optimization

**Key Difference from Random Access**

Stream Access uses {py:func}`~accvlab.on_demand_video_decoder.CreateSampleReader` instead of 
{py:func}`~accvlab.on_demand_video_decoder.CreateGopDecoder`. The key advantage is the use of 
caching-based optimizations. There is also the ability to iterate over individual sets of video file sets, 
each set being accessed sequentially (with the number of sets being controlled by the `num_of_set` parameter).

**Code Walkthrough**

Initialize the sample reader:

```python
nv_gop_dec = nvc.CreateSampleReader(
    num_of_set=1,              # Cache for this many video sets
    num_of_file=6,             # Maximum number of files per set
    iGpu=0
)
```

**Understanding num_of_set**

The `num_of_set` parameter controls caching behavior:
- Set to 1 for simple sequential access
- Set to `batch_size` for StreamPETR-like access patterns (iterating over the samples inside a batch, 
  accessing the same video files in every `batch_size`-th call to the decoder)

Example: If `batch_size==4`, set `num_of_set=4` to cache 4 different video clips.

Process frames sequentially:

```python
file_path_list = [
    "/data/videos/scene_CAM_BACK_LEFT.mp4",
    "/data/videos/scene_CAM_BACK.mp4",
    # ... more files
]

# Start from frame 0
frame_id_list = [0] * len(file_path_list)

for iteration in range(num_iterations):
    # Increment frame indices (sequential access)
    frame_id_list = [fid + 7 for fid in frame_id_list]
    
    decoded_frames = nv_gop_dec.DecodeN12ToRGB(
        file_path_list,
        frame_id_list,
        True
    )
```

**Caching Behavior**

Stream Access mode caches:
- Demuxer state
- Decoder state  
- Recently accessed GOPs

This reduces overhead for sequential access patterns compared to Random Access mode.

**Running the Sample**

```bash
cd packages/on_demand_video_decoder/samples
python SampleStreamAccess.py
```

#### 3.2.3 Sample: Async Stream Access

**File:** `packages/on_demand_video_decoder/samples/SampleStreamAsyncAccess.py`

**When to Use**

Async Stream Access is beneficial when:
- Lower latency is required for streaming applications
- Prefetching next frame while processing current frame improves latency
- Labeling task model need high-performance inference
- GPU utilization needs to be maximized through overlapped operations

**Key Advantages Over Basic Stream Access**

| Feature | Stream Access | Async Stream Access |
|---------|---------------|---------------------|
| Decode mode | Synchronous | Asynchronous with prefetching |
| Latency | Standard | Lower (prefetched frames ready) |
| GPU utilization | Standard | Better (decode/process overlap) |

**Core APIs**

- {py:func}`~accvlab.on_demand_video_decoder.CreateSampleReader`: Initialize the sample reader
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvSampleReader.DecodeN12ToRGBAsync`: Start asynchronous decoding
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvSampleReader.DecodeN12ToRGBAsyncGetBuffer`: Retrieve decoded frames from buffer

**Code Walkthrough**

Initialize the sample reader:

```python
import accvlab.on_demand_video_decoder as nvc

nv_stream_dec = nvc.CreateSampleReader(
    num_of_set=1,              # Cache for this many video sets
    num_of_file=6,             # Maximum number of files per set
    iGpu=0                     # Target GPU device ID
)
```

**Async Decoding Pattern**

The async pattern consists of two main operations:

1. **`DecodeN12ToRGBAsync`**: Start asynchronous decoding (non-blocking)
2. **`DecodeN12ToRGBAsyncGetBuffer`**: Get decoded frames (waits if not ready)

First iteration - start async decode and get result:

```python
# Start async decode
nv_stream_dec.DecodeN12ToRGBAsync(
    file_path_list,
    frame_id_list,
    False,  # Output in RGB format (False=RGB, True=BGR)
)

# Get the result (will wait for async decode to complete)
decoded_frames = nv_stream_dec.DecodeN12ToRGBAsyncGetBuffer(
    file_path_list,
    frame_id_list,
    False,  # Output in RGB format
)
```

Subsequent iterations - get prefetched result:

```python
# Get prefetched result from buffer (already decoded in background)
decoded_frames = nv_stream_dec.DecodeN12ToRGBAsyncGetBuffer(
    file_path_list,
    frame_id_list,
    False,  # Output in RGB format
)
```

**Prefetching Pattern**

The key optimization is prefetching the next frame while processing the current one:

```python
# Process current frame
tensor_list = [torch.as_tensor(frame, device='cuda') for frame in decoded_frames]
rgb_batch = torch.stack(tensor_list, dim=0)

# Prefetch next frame (non-blocking, happens in background)
if idx < len(frames_to_decode) - 1:
    next_frame = frames_to_decode[idx + 1]
    next_frame_id_list = [next_frame] * len(file_path_list)
    nv_stream_dec.DecodeN12ToRGBAsync(
        file_path_list,
        next_frame_id_list,
        False,
    )

# Continue processing current frame...
# Next iteration will get prefetched frame immediately
```

**Important: Zero-Copy Frame Management**

> **⚠️ Warning**: The decoded frames returned by `DecodeN12ToRGBAsyncGetBuffer` are zero-copy 
> references to internal buffers. You **must** deep copy the frames before calling 
> `DecodeN12ToRGBAsync` again, otherwise the data will be overwritten.

```python
# CORRECT: Deep copy frames before next async call
tensor_list = [torch.as_tensor(frame, device='cuda').clone() for frame in decoded_frames]
# or
rgb_batch = torch.stack([torch.as_tensor(frame, device='cuda') for frame in decoded_frames], dim=0)

# Now safe to call DecodeN12ToRGBAsync for next frame
nv_stream_dec.DecodeN12ToRGBAsync(...)
```

**Complete Async Workflow**

```
Iteration 1:
  DecodeN12ToRGBAsync(frame_0)     → Start decode
  DecodeN12ToRGBAsyncGetBuffer()   → Wait & get frame_0
  Process frame_0
  DecodeN12ToRGBAsync(frame_1)     → Prefetch frame_1

Iteration 2:
  DecodeN12ToRGBAsyncGetBuffer()   → Get prefetched frame_1 (fast!)
  Process frame_1
  DecodeN12ToRGBAsync(frame_2)     → Prefetch frame_2

Iteration N:
  DecodeN12ToRGBAsyncGetBuffer()   → Get prefetched frame_N
  Process frame_N
  (No prefetch for last frame)
```

**Running the Sample**

```bash
cd packages/on_demand_video_decoder/samples
python SampleStreamAsyncAccess.py
```

### 3.3 Separation Access Decoding

Separation Access mode decouples demuxing and decoding into two separate stages. This provides fine-grained control over the video processing pipeline and enables advanced optimization strategies.

#### 3.3.1 Use Cases

- Need separate control over demuxing and decoding
- One-time demuxing, multiple decoding operations
- Inspection or processing of intermediate packet data
- Custom processing pipelines

#### 3.3.2 Two-Stage Architecture

```
Stage 1 (Demuxing):
Video File → GetGOP() → Packet Data (GOP)
                         ├─ packets
                         ├─ first_frame_ids
                         └─ gop_lens

Stage 2 (Decoding):
Packet Data → DecodeFromGOPRGB() → Decoded Frames
```

#### 3.3.3 Sample: Basic Separation Access

**File:** `packages/on_demand_video_decoder/samples/SampleSeparationAccess.py`

**Core APIs**

- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.GetGOP`: Extract packet data (demuxing only)
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeFromGOPRGB`: Decode from packet data 
  (decoding only)

**Code Walkthrough**

Initialize two separate decoders:

```python
# Stage 1 decoder: for packet extraction
nv_gop_dec1 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

# Stage 2 decoder: for packet decoding
nv_gop_dec2 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)
```

> **ℹ️ Note**: Using separate decoder instances allows independent configuration and resource management.

Stage 1 - Extract packet data:

```python
file_path_list = [
    "/data/videos/scene_CAM_BACK_LEFT.mp4",
    "/data/videos/scene_CAM_BACK.mp4",
    # ... more files
]

# Extract GOP data containing frame 77 for all videos
packets, first_frame_ids, gop_lens = nv_gop_dec1.GetGOP(
    file_path_list,
    [77] * len(file_path_list)
)
```

**Understanding the return values:**
- `packets`: Compressed packet data (numpy array)
- `first_frame_ids`: First frame ID in each extracted GOP
- `gop_lens`: Number of frames in each GOP

Stage 2 - Decode from packet data:

```python
# Generate frame IDs within the GOP range
frame_id_list = [
    random.randint(first_frame_ids[i], first_frame_ids[i] + gop_lens[i] - 1)
    for i in range(len(file_path_list))
]

# Decode frames directly from packet data
decoded_frames = nv_gop_dec2.DecodeFromGOPRGB(
    packets,           # Packet data from Stage 1
    file_path_list,    # Original file paths (for reference)
    frame_id_list,     # Target frame indices
    True               # BGR output
)
```

**Validation**

Always validate that frame IDs are within GOP range:

```python
if frame_id < first_frame_ids[i] or frame_id >= first_frame_ids[i] + gop_lens[i]:
    print(f"Frame {frame_id} is out of range for GOP starting at {first_frame_ids[i]}")
```

**Advantages of Separation**

1. Demux once, decode multiple times with different frame selections
2. Ability to inspect or process packet data
3. Separate optimization of demuxing and decoding stages
4. Foundation for more advanced processing pipelines

**Running the Sample**

```bash
cd packages/on_demand_video_decoder/samples
python SampleSeparationAccess.py
```

#### 3.3.4 Sample: Separation Access with GetGOPList API

**File:** `packages/on_demand_video_decoder/samples/SampleSeparationAccessGOPListAPI.py`

**When to Use**

{py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.GetGOPList` is preferred over 
{py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.GetGOP` when:
- Processing large video collections
- Per-video cache management is needed
- Selective video loading is required
- Distributed storage and processing

**Core Difference: {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.GetGOP` vs** 
**{py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.GetGOPList`**

| Feature | {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.GetGOP` | {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.GetGOPList` |
|---------|--------|------------|
| Return type | Single merged bundle | List of per-video bundles |
| Data structure | `(packets, ids, lens)` | `[(packets1, ids1, lens1), (packets2, ids2, lens2), ...]` |
| Memory management | Load all or nothing | Load selectively |
| Decoding API | DecodeFromGOPRGB | DecodeFromGOPListRGB |
| Best for | Batch processing all videos | Per-video management |

**Core APIs**

- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.GetGOPList`: Extract packet data per video (not 
  merged)
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeFromGOPListRGB`: Decode from list of packet 
  data

**Code Walkthrough**

Stage 1 - Extract per-video GOP data:

```python
file_path_list = [
    "/data/videos/CAM_BACK_LEFT.mp4",
    "/data/videos/CAM_BACK.mp4",
    "/data/videos/CAM_BACK_RIGHT.mp4",
    "/data/videos/CAM_FRONT_LEFT.mp4",
    "/data/videos/CAM_FRONT.mp4",
    "/data/videos/CAM_FRONT_RIGHT.mp4",
]

# Extract GOP data, returns list of tuples
gop_list = nv_gop_dec1.GetGOPList(
    file_path_list,
    [77] * len(file_path_list)
)

# gop_list structure:
# [
#   (packets_video1, first_frame_ids_video1, gop_lens_video1),
#   (packets_video2, first_frame_ids_video2, gop_lens_video2),
#   ...
# ]
```

Per-video GOP data inspection:

```python
for i, (gop_data, first_frame_ids, gop_lens) in enumerate(gop_list):
    print(f"Video {i}:")
    print(f"  GOP data size: {len(gop_data)} bytes")
    print(f"  First frame ID: {first_frame_ids[0]}")
    print(f"  GOP length: {gop_lens[0]}")
```

Simulating per-video caching:

```python
# Cache GOP data per video
gop_cache = {}
for i, (gop_data, first_frame_ids, gop_lens) in enumerate(gop_list):
    cache_key = f"video_{i}_frame_77"
    gop_cache[cache_key] = {
        'gop_data': gop_data,
        'first_frame_ids': first_frame_ids,
        'gop_lens': gop_lens,
        'filepath': file_path_list[i]
    }
```

Stage 2 - Selective decoding:

```python
# Select only specific videos to decode (e.g., front cameras only)
selected_indices = [3, 4, 5]  # Front-left, front, front-right

selected_gop_data_list = []
selected_filepaths = []
selected_frame_ids = []

for idx in selected_indices:
    cache_key = f"video_{idx}_frame_77"
    cached_item = gop_cache[cache_key]
    
    # Generate random frame within GOP range
    first_frame_id = cached_item['first_frame_ids'][0]
    gop_len = cached_item['gop_lens'][0]
    random_frame = random.randint(first_frame_id, first_frame_id + gop_len - 1)
    
    selected_gop_data_list.append(cached_item['gop_data'])
    selected_filepaths.append(cached_item['filepath'])
    selected_frame_ids.append(random_frame)

# Decode only selected videos
decoded_frames = nv_gop_dec2.DecodeFromGOPListRGB(
    selected_gop_data_list,  # List of GOP data for selected videos
    selected_filepaths,      # Corresponding file paths
    selected_frame_ids,      # Frame IDs to decode
    True                     # BGR output
)
```

**Key Advantages**

1. Load only required videos from cache (memory efficient)
2. Per-video cache management (independent expiration, priority)
3. Better suited for distributed systems
4. Reduced inter-video dependencies

**Running the Sample**

```bash
cd packages/on_demand_video_decoder/samples
python SampleSeparationAccessGOPListAPI.py
```

#### 3.3.5 GOP Caching Feature

The GOP caching feature automatically stores extracted GOP data in Python memory, eliminating the need for 
manual cache management by the user. When enabled, subsequent calls to {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOP` or {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOPList` with the same 
video file and a `frame_id` within the cached GOP range will return cached data without re-demuxing.

**Why Use GOP Caching?**

In training scenarios, especially with video datasets:
- The same video file may be accessed multiple times with different frame indices
- Multiple frame indices often fall within the same GOP (Group of Pictures)
- Re-demuxing for each access wastes I/O and CPU resources

Without caching, users would need to manually track GOP ranges and manage cache dictionaries. With the 
`useGOPCache` parameter, this is handled automatically.

**Enabling GOP Caching**

Set `useGOPCache=True` when calling {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOP` or {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOPList`:

```python
import accvlab.on_demand_video_decoder as nvc

decoder = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

# First call - fetches GOP data from video files
packets, first_ids, gop_lens = decoder.GetGOP(
    file_path_list, 
    [77] * len(file_path_list), 
    useGOPCache=True
)

# Second call with frame_id=80 (within the same GOP range) - returns from cache
packets, first_ids, gop_lens = decoder.GetGOP(
    file_path_list, 
    [80] * len(file_path_list), 
    useGOPCache=True
)
```

**Cache Hit Condition**

A cache hit occurs when:
- The requested `filepath` matches a cached entry
- The requested `frame_id` satisfies: `first_frame_id <= frame_id < first_frame_id + gop_len`

If the `frame_id` is outside the cached GOP range, a new GOP is fetched and the cache is updated.

**Checking Cache Hit Status**

Use the {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.isCacheHit` method to check whether the last {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOP` or {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOPList` call hit the cache:

```python
# Call GetGOP with caching
packets, first_ids, gop_lens = decoder.GetGOP(file_path_list, frame_ids, useGOPCache=True)

# Check cache hit status for each video
cache_hits = decoder.isCacheHit()
print(cache_hits)  # [True, False, True, True, False] - per-video cache hit status
```

The return value is a list of booleans, one for each video in the request, indicating whether the cached 
data was used (`True`) or new data was fetched (`False`).

**Cache Management Methods**

The decoder provides methods to manage the cache:

| Method | Description |
|--------|-------------|
| {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.get_cache_info` | Returns a dictionary with cache statistics |
| {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.clear_cache` | Clears all cached GOP data |

Example:

```python
# Get cache information
cache_info = decoder.get_cache_info()
print(f"Cached files: {cache_info['cached_files_count']}")
print(f"File paths: {cache_info['cached_files']}")

# Clear all cache when done
decoder.clear_cache()
```

**GOP Caching with GetGOPList**

The caching feature works identically with {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOPList`:

```python
# First call - all videos are fetched
gop_list = decoder.GetGOPList(file_path_list, [77, 77, 77], useGOPCache=True)
print(decoder.isCacheHit())  # [False, False, False]

# Second call with some frame_ids in range, some out of range
gop_list = decoder.GetGOPList(file_path_list, [80, 80, 150], useGOPCache=True)
print(decoder.isCacheHit())  # [True, True, False] - partial cache hit
```

**Shared Cache Between GetGOP and GetGOPList**

The cache is shared between {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOP` and {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.GetGOPList` calls on the same decoder instance:

```python
# Cache populated via GetGOP
packets, _, _ = decoder.GetGOP(["/path/to/video.mp4"], [50], useGOPCache=True)

# Cache hit via GetGOPList (same file, frame_id in range)
gop_list = decoder.GetGOPList(["/path/to/video.mp4"], [55], useGOPCache=True)
print(decoder.isCacheHit())  # [True]
```

> **⚠️ Note**: The cache is stored in Python memory. Each video file caches only one GOP (the most 
> recently accessed). For long-running processes with many different videos, use {py:meth}`~accvlab.on_demand_video_decoder.CachedGopDecoder.clear_cache` to 
> release memory when needed.

**When to Use GOP Caching**

✓ Training loops with random frame sampling from the same video
✓ Multi-camera setups where cameras are often accessed with similar frame indices
✓ Scenarios where the same GOP is likely to be accessed multiple times
✓ Reducing I/O overhead in data loading pipelines

✗ One-time video processing (no repeated access)
✗ Memory-constrained environments with large video collections
✗ Scenarios where each frame access targets a different GOP

### 3.4 Demuxer-Free Decoding

Demuxer-Free mode allows decoding directly from pre-extracted GOP data, either stored on disk or in memory. This approach is ideal for scenarios requiring repeated access to the same video segments.

#### 3.4.1 Use Cases

- Pre-processing video datasets for training
- Repeated access to same video segments
- Disk storage for GOP data caching
- Eliminating demuxing overhead in production
- PyTorch DataLoader integration with worker processes

#### 3.4.2 Sample: GOP File Storage and Decoding

**File:** `packages/on_demand_video_decoder/samples/SampleDecodeFromGopFiles.py`

**Two-Phase Workflow**

```
Phase 1: GOP Data Preparation
Video Files → GetGOP() → SavePacketsToFile() → .bin files on disk

Phase 2: Decoding from Files
.bin files → LoadGops() → DecodeFromGOPRGB() → Decoded Frames
```

**Core APIs**

- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.GetGOP`: Extract GOP packet data
- {py:func}`~accvlab.on_demand_video_decoder.SavePacketsToFile`: Save packets to binary file
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.LoadGops`: Load packets from binary files (merged)
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeFromGOPRGB`: Decode from loaded packets

**Code Walkthrough**

Initialize decoders:

```python
# Decoder for packet extraction
nv_gop_dec1 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

# Decoder for GOP file decoding
nv_gop_dec2 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)
```

Phase 1 - Extract and save GOP data:

```python
file_list = [
    "/data/videos/CAM_BACK_LEFT.mp4",
    "/data/videos/CAM_BACK.mp4",
    # ... more files
]

frames = [random.randint(0, 200) for _ in range(len(file_list))]
packet_files = []

for i in range(len(file_list)):
    # Extract packet data for single file
    numpy_data, first_frame_ids, gop_lens = nv_gop_dec1.GetGOP(
        file_list[i:i+1],
        frames[i:i+1]
    )
    
    # Save to binary file
    packet_file = f"./gop_packets_{i:02d}.bin"
    nvc.SavePacketsToFile(numpy_data, packet_file)
    packet_files.append(packet_file)
    
    print(f"Saved GOP data: {os.path.getsize(packet_file)} bytes")
```

Phase 2 - Load and decode from GOP files:

```python
# Load stored GOP data
merged_numpy_data = nv_gop_dec2.LoadGops(packet_files)

print(f"Loaded GOP data: {merged_numpy_data.size} bytes")

# Decode frames from loaded data
decoded_frames = nv_gop_dec2.DecodeFromGOPRGB(
    merged_numpy_data,  # Merged packet data from LoadGops
    file_list,          # Original video file paths
    frames,             # Target frame indices
    as_bgr=True
)
```

Cleanup temporary files:

```python
for packet_file in packet_files:
    if os.path.exists(packet_file):
        os.remove(packet_file)
```

**File Format**

GOP files are binary files containing raw packet data. The format is:
- Binary format (no header)
- Direct memory dump of packet data
- File extension: `.bin` (recommended)

**Storage Considerations**

- GOP file size: Typically 5-15% of original video size
- Storage savings: ~85-95% compared to extracted frames
- I/O performance: SSD recommended for best performance

**When to Use**

Use GOP file storage when:
- Same video segments accessed repeatedly
- Training multiple epochs on the same dataset
- Storage is cheaper than compute
- Want to eliminate demuxing overhead

**Running the Sample**

```bash
cd packages/on_demand_video_decoder/samples
python SampleDecodeFromGopFiles.py
```

#### 3.4.3 Sample: GOP File List API

**File:** `packages/on_demand_video_decoder/samples/SampleDecodeFromGopFilesToListAPI.py`

**When to Use**

{py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.LoadGopsToList` is preferred over 
{py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.LoadGops` when:
- Large video collections (>10 videos)
- Need selective loading of specific videos
- Per-video cache management
- Distributed caching systems

**Core Difference: {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.LoadGops` vs** 
**{py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.LoadGopsToList`**

| Feature | {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.LoadGops` | {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.LoadGopsToList` |
|---------|----------|----------------|
| Return type | Single merged numpy array | List of numpy arrays (one per video) |
| Loading | All or nothing | Selective loading possible |
| Memory usage | Load all GOP data at once | Load only needed videos |
| Decoding API | DecodeFromGOPRGB | DecodeFromGOPListRGB |
| Best for | Small video sets | Large video collections |

**Core APIs**

- {py:func}`~accvlab.on_demand_video_decoder.SavePacketsToFile`: Save per-video GOP data
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.LoadGopsToList`: Load GOP files as list (not 
  merged)
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeFromGOPListRGB`: Decode from list of GOP 
  data

**Code Walkthrough**

Phase 1 - Save per-video GOP files:

```python
file_list = [
    "/data/videos/CAM_BACK_LEFT.mp4",
    "/data/videos/CAM_BACK.mp4",
    "/data/videos/CAM_BACK_RIGHT.mp4",
    "/data/videos/CAM_FRONT_LEFT.mp4",
    "/data/videos/CAM_FRONT.mp4",
    "/data/videos/CAM_FRONT_RIGHT.mp4",
]

camera_names = ["CAM_BACK_LEFT", "CAM_BACK", "CAM_BACK_RIGHT",
                "CAM_FRONT_LEFT", "CAM_FRONT", "CAM_FRONT_RIGHT"]

packet_files = []
frames = [random.randint(0, 200) for _ in range(len(file_list))]

for i in range(len(file_list)):
    # Extract GOP data for single video
    numpy_data, first_frame_ids, gop_lens = nv_gop_dec1.GetGOP(
        file_list[i:i+1],
        frames[i:i+1]
    )
    
    # Create unique filename per video
    packet_file = f"./gop_{camera_names[i]}.bin"
    nvc.SavePacketsToFile(numpy_data, packet_file)
    packet_files.append(packet_file)
```

Phase 2 - Load all GOP files as list:

```python
# Load GOP files as separate bundles (not merged)
gop_data_list = nv_gop_dec2.LoadGopsToList(packet_files)

# gop_data_list is a list of numpy arrays, one per video
print(f"Loaded {len(gop_data_list)} GOP bundles")
for i, gop_data in enumerate(gop_data_list):
    print(f"  Bundle {i} ({camera_names[i]}): {len(gop_data)} bytes")
```

Decode from GOP list:

```python
# Decode all videos
decoded_frames = nv_gop_dec2.DecodeFromGOPListRGB(
    gop_data_list,  # List of GOP data
    file_list,      # List of file paths
    frames,         # List of frame IDs
    as_bgr=True
)
```

Phase 3 - Selective loading demonstration:

```python
# Select only front cameras (indices 3, 4, 5)
selected_indices = [3, 4, 5]
selected_files = [packet_files[i] for i in selected_indices]
selected_video_paths = [file_list[i] for i in selected_indices]
selected_frames = [frames[i] for i in selected_indices]

# Load only selected GOP files
selected_gop_list = nv_gop_dec2.LoadGopsToList(selected_files)

# Decode only selected videos
decoded_frames = nv_gop_dec2.DecodeFromGOPListRGB(
    selected_gop_list,
    selected_video_paths,
    selected_frames,
    as_bgr=True
)

print(f"Loaded and decoded only {len(selected_indices)} out of {len(packet_files)} videos")
```

**Key Advantages**

1. Memory efficiency: Load only needed videos
2. Flexible loading: Different subsets for different batches
3. Distributed caching: Store videos on different machines
4. Per-video cache management: Independent expiration policies

**Running the Sample**

```bash
cd packages/on_demand_video_decoder/samples
python SampleDecodeFromGopFilesToListAPI.py
```

#### 3.4.4 Sample: Batch Decode from Multiple Demux Results

**File:** `packages/on_demand_video_decoder/samples/SampleDecodeFromGopList.py`

**When to Use**

This sample demonstrates the pattern of multiple demuxing operations followed by a single batch decode:
- Demux executed N times separately (e.g., in DataLoader `__getitem__`, called batch_size times)
- Decode executed once for the entire batch
- Enables parallel demuxing in worker processes, centralized batch decoding in main process
- No disk I/O for GOP data (in-memory packet passing)

**Architecture: N Demux → 1 Batch Decode**

```
Worker/Process 1: Video File 1 → GetGOP() → packets_1 (in memory)
Worker/Process 2: Video File 2 → GetGOP() → packets_2 (in memory)
Worker/Process 3: Video File 3 → GetGOP() → packets_3 (in memory)
                     ⋮                            ⋮
Worker/Process N: Video File N → GetGOP() → packets_N (in memory)
                                                      ↓
                          Collect all packets: [packets_1, packets_2, ..., packets_N]
                                                      ↓
                  Main Process: DecodeFromGOPListRGB() → Batch of N Frames (single decode call)
```

**Core Concept**

Multiple separate demuxing operations → Single batch decoding operation

**Core APIs**

- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.GetGOP`: Extract packets (called N times, 
  possibly in parallel)
- {py:meth}`~accvlab.on_demand_video_decoder.PyNvGopDecoder.DecodeFromGOPListRGB`: Batch decode from list of 
  packets (called once for entire batch)

**Code Walkthrough**

Initialize decoders:

```python
# Worker decoder (simulated): for packet extraction
nv_gop_dec1 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)

# Main process decoder: for batch decoding
nv_gop_dec2 = nvc.CreateGopDecoder(maxfiles=6, iGpu=0)
```

Phase 1 - Multiple demux operations (simulating parallel workers):

```python
file_list = [
    "/data/videos/CAM_BACK_LEFT.mp4",
    "/data/videos/CAM_BACK.mp4",
    # ... more files
]

frames = [random.randint(0, 200) for _ in range(len(file_list))]

# Demux executed N times (e.g., in DataLoader __getitem__, called batch_size times)
packets_list = []

for i in range(len(file_list)):
    # Each demux operation extracts packets for one video
    numpy_data, first_frame_ids, gop_lens = nv_gop_dec1.GetGOP(
        file_list[i:i+1],
        frames[i:i+1]
    )
    packets_list.append(numpy_data)
    print(f"Demux {i+1}: Extracted {numpy_data.size} bytes")
```

Phase 2 - Single batch decode (in main process):

```python
# Decode executed once for all N demux results
decoded_frames = nv_gop_dec2.DecodeFromGOPListRGB(
    packets_list,  # List of N packet data from multiple demux operations
    file_list,     # Original file paths
    frames,        # Target frame IDs
    as_bgr=True
)

print(f"Batch decode: {len(decoded_frames)} frames decoded in one call")
```

**DataLoader Integration Pattern**

In a real PyTorch DataLoader:

```python
# In worker process (worker_fn)
def worker_fn(video_path, frame_id):
    packets, first_ids, gop_lens = decoder.GetGOP([video_path], [frame_id])
    return packets

# In main process collate_fn
def collate_fn(batch):
    packets_list = [item['packets'] for item in batch]
    file_paths = [item['file_path'] for item in batch]
    frame_ids = [item['frame_id'] for item in batch]
    
    # Batch decode in main process
    frames = decoder.DecodeFromGOPListRGB(packets_list, file_paths, frame_ids, True)
    return frames
```

**Key Benefits**

1. **Parallel demuxing**: Each worker demuxes independently in parallel
2. **Single batch decode**: GPU decoder called only once for entire batch (efficient GPU utilization)
3. **No disk I/O**: Packets passed in memory, no temporary file storage
4. **Resource separation**: CPU-heavy demuxing in workers, GPU decoding in main process

**Memory Management**

- Keep packet data lifetime short (decode and release)
- Monitor memory usage in worker processes
- Balance worker count with available memory

**Running the Sample**

```bash
cd packages/on_demand_video_decoder/samples
python SampleDecodeFromGopList.py
```