Physical-AI-AV (PAI) Dataset#
The NCore PAI tool converts data from the NVIDIA PhysicalAI-Autonomous-Vehicles HuggingFace dataset into NCore V4 format.
Conventions#
The PAI dataset is collected on the NVIDIA Hyperion 8 / Hyperion 8.1 sensor platform and provides timestamped sequence data from 7 cameras, 1 top lidar, egomotion, and cuboid obstacle labels along with per-sensor calibrations, organized as clips of ~20 seconds each.
Camera Sensors#
Front Wide Camera 120° FOV (camera_front_wide_120fov)
Front Tele Camera 30° FOV (camera_front_tele_30fov)
Cross Left Camera 120° FOV (camera_cross_left_120fov)
Cross Right Camera 120° FOV (camera_cross_right_120fov)
Rear Left Camera 70° FOV (camera_rear_left_70fov)
Rear Right Camera 70° FOV (camera_rear_right_70fov)
Rear Tele Camera 30° FOV (camera_rear_tele_30fov)
Camera video is stored as MP4 videos and extracted to individual JPEG image frames during conversion. Per-frame timestamps and optional blur-box metadata are provided.
Camera intrinsics are compatible with the
FThetaCameraModelParameters,
OpenCVFisheyeCameraModelParameters, or
OpenCVPinholeCameraModelParameters models depending on the
sensor. Each camera model also carries a shutter_delay_us field that is used
to compute per-frame rolling shutter start timestamps. Additionally, windshield
model parameters BivariateWindshieldModelParameters are
represented if available for a given camera sensor.
LiDAR Sensors#
Top Lidar 360° FOV (lidar_top_360fov)
Lidar scans are converted from DRACO-compressed point clouds [1]. Each
lidar spin includes spin_start_timestamp, spin_end_timestamp, and
per-point attributes: XYZ position, normalized intensity, per-point timestamp,
and lidar model element indices. Points within the ego vehicle bounding box are
filtered out during conversion.
Lidar intrinsics are implemented by the
RowOffsetStructuredSpinningLidarModelParameters sensor
model.
Egomotion and Labels#
Rig-to-world trajectories are converted from timestamped pose samples (encoded as translation and orientation pairs). The first pose in the selected time window defines the local world reference frame; all subsequent poses are expressed relative to it.
Cuboid obstacle labels are stored in a per-clip obstacle.parquet with
bounding box extents, orientation quaternions, track IDs, class IDs, and label
source annotations.
Data Access#
The converter supports two processing modes:
- Local mode (
pai-v4) Clips are first downloaded to disk using the
pai-clip-dltool, then converted from the local storage.- Streaming mode (
pai-stream-v4) Clips are streamed directly from HuggingFace — no prior download is required. Calibration parquet files are downloaded per dataset chunk and filtered to the target clip. Video files are temporarily written to disk and cleaned up after each clip.
When available, the streaming provider automatically uses pre-processed
.offlinevariants of features (e.g. calibration, egomotion, obstacle) in place of the online features.
Prerequisites#
A HuggingFace account with the PAI dataset license accepted
A HuggingFace API token (via the
HF_TOKENenvironment variable or by passing--hf-token)For local mode: clips downloaded with the
pai-clip-dltool
Downloading Clips (Local Mode)#
The pai-clip-dl tool manages downloads from the HuggingFace dataset:
# Download one or more clips to a local directory
bazel run \
//tools/data_converter/pai/pai_remote:pai-clip-dl \
-- \
download <clip-id> [<clip-id> ...] \
--output-dir /path/to/data
# Show clip metadata and sensor presence
bazel run \
//tools/data_converter/pai/pai_remote:pai-clip-dl \
-- \
info <clip-id>
# List all available features in the dataset
bazel run \
//tools/data_converter/pai/pai_remote:pai-clip-dl \
-- \
list-features
The download command accepts --features (repeatable) to selectively
download specific feature types rather than the full clip. Omitting
--features downloads all available features.
The downloaded clip directory has this layout:
{output_dir}/{clip_id}/
├── calibration/
│ ├── camera_intrinsics.parquet
│ ├── sensor_extrinsics.parquet
│ ├── vehicle_dimensions.parquet
│ └── lidar_intrinsics.parquet (optional)
├── labels/
│ ├── {clip_id}.egomotion.parquet
│ └── {clip_id}.obstacle.parquet (optional)
├── camera/
│ ├── {clip_id}.{camera_id}.mp4
│ ├── {clip_id}.{camera_id}.timestamps.parquet
│ └── {clip_id}.{camera_id}.blurred_boxes.parquet
├── lidar/
│ └── {clip_id}.lidar_top_360fov.parquet
└── metadata/
├── sensor_presence.parquet
├── data_collection.parquet
└── provenance.json (download source, optional)
Conversion#
The converter uses NCore V4’s component-based architecture. Data is written to
NCore format via SequenceComponentGroupsWriter with
specialized component writers for poses, intrinsics, cameras, lidar, masks, and
cuboid labels.
Usage#
Local mode — convert clips previously downloaded with pai-clip-dl:
bazel run //tools/data_converter/pai:convert -- \
--root-dir <PATH_TO_CLIPS> \
--output-dir <PATH_TO_OUTPUT> \
pai-v4
Streaming mode — convert clips directly from HuggingFace without downloading:
bazel run //tools/data_converter/pai:convert -- \
--output-dir <PATH_TO_OUTPUT> \
pai-stream-v4 \
--clip-id <clip-id> \
--hf-token <your-hf-token>
The output for each clip is written to:
<output-dir>/pai_<clip-id>/pai_<clip-id>.ncore4.zarr.itar
Base arguments (required):
Argument |
Description |
|---|---|
|
Directory containing clip subdirectories. Required for file-based
converters (e.g. |
|
Path where converted NCore V4 sequences will be written |
Base arguments (optional):
Argument |
Description |
|---|---|
|
Disable exporting all camera sensors |
|
Export only the specified camera (repeatable; defaults to all cameras) |
|
Disable exporting all lidar sensors |
|
Enable debug-level logging |
Shared subcommand arguments (pai-v4 and pai-stream-v4):
Argument |
Default |
Description |
|---|---|---|
|
all clips |
Specific clip ID(s) to convert (repeatable). Required for streaming mode; filters discovered directories in local mode |
|
|
Skip this many seconds from the start of each clip before converting |
|
|
Limit the converted duration of each clip to this many seconds |
|
|
Output store format. |
|
|
Component group layout. |
|
enabled |
Whether to write a JSON metadata file alongside each converted sequence |
Additional arguments (pai-stream-v4 only):
Argument |
Default |
Description |
|---|---|---|
|
|
HuggingFace API token. Reads from the |
|
|
HuggingFace dataset branch or tag to stream from. Note: the
|
For the complete implementation, see tools/data_converter/pai/converter.py.
API Reference#
V4 Components (ncore.data.v4):
SequenceComponentGroupsWriter- Main writer for V4 sequencesPosesComponent- Static and dynamic pose storageIntrinsicsComponent- Camera and lidar intrinsicsLidarSensorComponent- Lidar frame dataCameraSensorComponent- Camera frame dataCuboidsComponent- 3D cuboid track observationsMasksComponent- Camera ego masks
Data Converter (ncore.data_converter):
BaseDataConverter- Abstract base class for all convertersBaseDataConverterConfig- Base configuration dataclassFileBasedDataConverter- Abstract base class for converters that read from a local root directory (--root-dir)FileBasedDataConverterConfig- Configuration base for file-based converters (validates--root-dir)
Sensor Models (ncore.data):
FThetaCameraModelParameters- FTheta (equidistant) camera modelOpenCVFisheyeCameraModelParameters- Kannala-Brandt fisheye camera modelOpenCVPinholeCameraModelParameters- Radial/tangential pinhole camera modelRowOffsetStructuredSpinningLidarModelParameters- Spinning lidar model
Footnotes