Evaluation

The on-demand video decoder was used for training a StreamPETR model on the NuScenes mini dataset and compared to the performance to both the original StreamPETR implementation (with image-based training), and in one case to OpenCV-based video training. The results are shown below.

Setup

Experiment Setup

For the video training, the demuxer-free approach is used (see DataLoader Demuxer-Free Example for details on this approach). Here, the GOP packets are extracted and stored prior to the training.

In the video training, the frames are decoded in the training process, and consequently, pre-processing is performed in the training process on the GPU. Note that this is not a viable optimization for the image-based training, as it adds significant overhead when passing the full-resolution images to the training process.

The training is performed for the NuScenes mini dataset, with the following configuration:

  • Video

    • GOP size of 30

    • No B-frames

    • Including both samples and sweeps (resulting in ~12 frames per second)

    • 1600x900 resolution (same as images)

  • Batch size of 16 per GPU

Note

We are planning to add a demo for the On-Demand Video Decoder package in the future, including the implementation of the experiments performed in this evaluation.

Hardware Setup A

System Configuration

GPU

CPU

8x NVIDIA RTX 6000D

2x AMD EPYC 7742 64-core Processors

Hardware Setup B

System Configuration

GPU

CPU

8x NVIDIA H20

2x Intel Xeon Platinum 8468V 48-core Processors

Results & Discussion

Results

Results for both hardware systems are shown in the following tables.

Runtime Comparison for Hardware Setup A

Configuration

Image [ms]

Video: OpenCV [ms]

Video: Ours [ms]

Speedup (vs. Image)

1 GPU

725

1674

751

× 0.97

8 GPU

1025

2663

908

× 1.13

Runtime Comparison for Hardware Setup B

Configuration

Image [ms]

Video [ms]

Speedup

1 GPU

878

862

× 1.02

8 GPU

1310

1070

× 1.22

Discussion

On both systems, the performance of the video-based training is comparable to the image-based training for the 1 GPU configuration. The video training outperforms the image training for the 8 GPU configuration, with the speedup depending on the system. However, please note that the main goal is to reduce the storage requirements while maintaining good performance.