Dataset Preparation
This section describes how to set up the NuScenes Mini dataset for testing and profiling the video decoder, as well as for potential training purposes.
Dataset Overview
We use NuScenes Mini as the test dataset. NuScenes is a large-scale autonomous driving dataset that provides multi-modal sensor data including camera images, LiDAR, and radar. The mini version contains a subset of the full dataset, making it ideal for testing and development.
Dataset Source: NuScenes Official Website
Mini Dataset: Kaggle NuScenes Mini
Download Dataset
Download the dataset, e.g. using the Kaggle tool:
import kagglehub
# Download latest version
path = kagglehub.dataset_download("aadimator/nuscenes-mini")
print("Path to dataset files:", path)
# Default download path: ~/.cache/kagglehub/datasets/aadimator/nuscenes-mini/
Convert Images to Video
The NuScenes dataset contains individual JPEG images that need to be converted to a video format for use in the video decoder.
Note
The converter scripts are designed to be run in an environment where a fully-functional
FFmpeg binary is available. This is not the case in our default docker image, where only a minimal
version of FFmpeg is set up. Please use an environment with a fully-functional FFmpeg binary to run
the converter scripts.
Note
The examples below assume that the working directory is the root directory of the accvlab
package. For other working directories, you need to adjust the paths accordingly. The scripts are located at
packages/on_demand_video_decoder/scripts/ in the accvlab package.
We provide scripts that will combine and convert the images from both the samples & sweeps to the video format, and modify the sample data metadata to point to the correct video filename & frame ID corresponding to the image originally used in the sample.
This conversion can be performed in two steps:
Step 1: Convert sample & sweep images to video format
python packages/on_demand_video_decoder/scripts/generate_nuscenes_video_with_sweeps.py \
--nuscenes_root /path/to/your/nuscenes/dataset/ \
--fps 12 \
--gop_size 30 \
--interpolation_num_frames 0 \
--video_sub_dir path_to_subdirectory_for_generated_videos
Note that:
The parameter
--fpssets the FPS information in the metadata for the generated videos (and has no other effect).The parameter
--gop_sizesets the GOP size for the generated videos.The parameter
--interpolation_num_framessets the number of additional frames to add between existing frames. A simple linear interpolation is used in this case. The default value is 0, which means no interpolation is performed.The version of the NuScenes dataset does not need to be specified. The script does not access the metadata. Instead, it automatically processes all available samples and sweeps, using information contained in the file paths (including filenames & timestamps) as a basis for grouping the images into videos. This means that if the full NuScenes dataset is present, the script will process all available samples and sweeps. For development, it is recommended to use the script in a dataset containing only data from the mini version.
You can call the script with -h to see all available options.
Note
The example scripts provided for the On-Demand Video Decoder package (see Sample Code Documentation and PyTorch Integration Examples) assume that:
The path to the NuScenes dataset is
/data/nuscenesThe path to the output directory is
/data/nuscenes/video_samples(set--video_sub_dir video_sampleswhen running this script to place the generated videos there)
Output layout for generated videos:
Per-sequence folders, e.g.:
<path_to_nuscenes_dataset>/video_samples/n008-2018-08-30-15-16-55-0400/CAM_FRONT.mp4Inside each sequence folder, files are named by camera only (
CAM_FRONT.mp4,CAM_BACK.mp4, etc.).
Note
The data is arranged into sequences based on the descriptor in the filename such as e.g.
n008-2018-08-30-15-16-55-0400 in n008-2018-08-30-15-16-55-0400__CAM_FRONT.jpg. However, in some cases,
multiple sequences use the same descriptor. In this case, the data is split into the individual sequences
based on large gaps in the timestamp. In this cases, the sequence folder is named with a __partN suffix,
e.g. n015-2018-11-21-19-38-26+0800__part0, n015-2018-11-21-19-38-26+0800__part1, etc.
Apart from the generated videos, the script will also create a json file mapping the original image
paths & filenames to the generated video paths (relative to the video file output directory,
default is video_samples) and frame IDs. This mapping is used in the next step to adjust the metadata for
the camera samples.
Step 2: Adjust camera sample metadata
This step updates the sample metadata to point to the correct frame IDs in the correct generated video for each camera sample.
python packages/on_demand_video_decoder/scripts/add_nuscenes_video_meta_from_json.py \
--nuscenes_root /path/to/your/nuscenes/dataset/ \
--nuscenes_version v1.0-mini \
--video_sub_dir path_to_subdirectory_for_generated_videos
You can call the script with -h to see all available options.
This script will create an updated sample_data_video.json file in the same directory as the original
sample_data.json file. To use the updated metadata, you can rename it to sample_data.json and replace the
original file. This will make the video-related metadata available in the sample data for cameras. The new
fields added to the metadata for camera sample data are:
video_filename: The filename of the video containing the image used in the sample (relative to the dataset root directory)video_frame: The frame index of the video containing the image used in the sample Note that the originalfilenamefield is not modified and still points to the original image file.