Physical-AI-AV (PAI) Dataset#

The NCore PAI tool converts data from the NVIDIA PhysicalAI-Autonomous-Vehicles HuggingFace dataset into NCore V4 format.

Conventions#

The PAI dataset is collected on the NVIDIA Hyperion 8 / Hyperion 8.1 sensor platform and provides timestamped sequence data from 7 cameras, 1 top lidar, egomotion, and cuboid obstacle labels along with per-sensor calibrations, organized as clips of ~20 seconds each.

Camera Sensors#

Front Wide Camera 120° FOV (camera_front_wide_120fov)

Front Tele Camera 30° FOV (camera_front_tele_30fov)

Cross Left Camera 120° FOV (camera_cross_left_120fov)

Cross Right Camera 120° FOV (camera_cross_right_120fov)

Rear Left Camera 70° FOV (camera_rear_left_70fov)

Rear Right Camera 70° FOV (camera_rear_right_70fov)

Rear Tele Camera 30° FOV (camera_rear_tele_30fov)

Camera video is stored as MP4 videos and extracted to individual JPEG image frames during conversion. Per-frame timestamps and optional blur-box metadata are provided.

Camera intrinsics are compatible with the FThetaCameraModelParameters, OpenCVFisheyeCameraModelParameters, or OpenCVPinholeCameraModelParameters models depending on the sensor. Each camera model also carries a shutter_delay_us field that is used to compute per-frame rolling shutter start timestamps. Additionally, windshield model parameters BivariateWindshieldModelParameters are represented if available for a given camera sensor.

LiDAR Sensors#

Top Lidar 360° FOV (lidar_top_360fov)

Lidar scans are converted from DRACO-compressed point clouds [1]. Each lidar spin includes spin_start_timestamp, spin_end_timestamp, and per-point attributes: XYZ position, normalized intensity, per-point timestamp, and lidar model element indices. Points within the ego vehicle bounding box are filtered out during conversion.

Lidar intrinsics are implemented by the RowOffsetStructuredSpinningLidarModelParameters sensor model.

Egomotion and Labels#

Rig-to-world trajectories are converted from timestamped pose samples (encoded as translation and orientation pairs). The first pose in the selected time window defines the local world reference frame; all subsequent poses are expressed relative to it.

Cuboid obstacle labels are stored in a per-clip obstacle.parquet with bounding box extents, orientation quaternions, track IDs, class IDs, and label source annotations.

Data Access#

The converter supports two processing modes:

Local mode (pai-v4)

Clips are first downloaded to disk using the pai-clip-dl tool, then converted from the local storage.

Streaming mode (pai-stream-v4)

Clips are streamed directly from HuggingFace — no prior download is required. Calibration parquet files are downloaded per dataset chunk and filtered to the target clip. Video files are temporarily written to disk and cleaned up after each clip.

When available, the streaming provider automatically uses pre-processed .offline variants of features (e.g. calibration, egomotion, obstacle) in place of the online features.

Prerequisites#

A HuggingFace account with the PAI dataset license accepted
A HuggingFace API token (via the HF_TOKEN environment variable or by passing --hf-token)
For local mode: clips downloaded with the pai-clip-dl tool

Downloading Clips (Local Mode)#

The pai-clip-dl tool manages downloads from the HuggingFace dataset:

# Download one or more clips to a local directory
bazel run \
  //tools/data_converter/pai/pai_remote:pai-clip-dl \
  -- \
  download <clip-id> [<clip-id> ...] \
  --output-dir /path/to/data

# Show clip metadata and sensor presence
bazel run \
  //tools/data_converter/pai/pai_remote:pai-clip-dl \
  -- \
  info <clip-id>

# List all available features in the dataset
bazel run \
  //tools/data_converter/pai/pai_remote:pai-clip-dl \
  -- \
  list-features

The download command accepts --features (repeatable) to selectively download specific feature types rather than the full clip. Omitting --features downloads all available features.

The downloaded clip directory has this layout:

{output_dir}/{clip_id}/
├── calibration/
│   ├── camera_intrinsics.parquet
│   ├── sensor_extrinsics.parquet
│   ├── vehicle_dimensions.parquet
│   └── lidar_intrinsics.parquet      (optional)
├── labels/
│   ├── {clip_id}.egomotion.parquet
│   └── {clip_id}.obstacle.parquet    (optional)
├── camera/
│   ├── {clip_id}.{camera_id}.mp4
│   ├── {clip_id}.{camera_id}.timestamps.parquet
│   └── {clip_id}.{camera_id}.blurred_boxes.parquet
├── lidar/
│   └── {clip_id}.lidar_top_360fov.parquet
└── metadata/
    ├── sensor_presence.parquet
    ├── data_collection.parquet
    └── provenance.json               (download source, optional)

Conversion#

The converter uses NCore V4’s component-based architecture. Data is written to NCore format via SequenceComponentGroupsWriter with specialized component writers for poses, intrinsics, cameras, lidar, masks, and cuboid labels.

Usage#

Local mode — convert clips previously downloaded with pai-clip-dl:

bazel run //tools/data_converter/pai:convert -- \
    --root-dir <PATH_TO_CLIPS> \
    --output-dir <PATH_TO_OUTPUT> \
    pai-v4

Streaming mode — convert clips directly from HuggingFace without downloading:

bazel run //tools/data_converter/pai:convert -- \
    --output-dir <PATH_TO_OUTPUT> \
    pai-stream-v4 \
    --clip-id <clip-id> \
    --hf-token <your-hf-token>

The output for each clip is written to:

<output-dir>/pai_<clip-id>/pai_<clip-id>.ncore4.zarr.itar

Base arguments (required):

Argument	Description
`--root-dir PATH`	Directory containing clip subdirectories. Required for file-based converters (e.g. `pai-v4`); not needed for streaming converters (e.g. `pai-stream-v4`)
`--output-dir PATH`	Path where converted NCore V4 sequences will be written

Base arguments (optional):

Argument	Description
`--no-cameras`	Disable exporting all camera sensors
`--camera-id ID`	Export only the specified camera (repeatable; defaults to all cameras)
`--no-lidars`	Disable exporting all lidar sensors
`--verbose`	Enable debug-level logging

Shared subcommand arguments (pai-v4 and pai-stream-v4):

Argument	Default	Description
`--clip-id ID`	all clips	Specific clip ID(s) to convert (repeatable). Required for streaming mode; filters discovered directories in local mode
`--seek-sec FLOAT`	`None`	Skip this many seconds from the start of each clip before converting
`--duration-sec FLOAT`	`None`	Limit the converted duration of each clip to this many seconds
`--store-type {itar,directory}`	`itar`	Output store format. `itar` produces an indexed tar archive; `directory` writes plain zarr directories
`--profile {default,separate-sensors,separate-all}`	`separate-sensors`	Component group layout. `default` groups all sensors together; `separate-sensors` gives each sensor its own group; `separate-all` splits every component type into its own group
`--sequence-meta` / `--no-sequence-meta`	enabled	Whether to write a JSON metadata file alongside each converted sequence

Additional arguments (pai-stream-v4 only):

Argument	Default	Description
`--hf-token TEXT`	`$HF_TOKEN`	HuggingFace API token. Reads from the `HF_TOKEN` environment variable if not provided
`--revision TEXT`	`main`	HuggingFace dataset branch or tag to stream from.

For the complete implementation, see tools/data_converter/pai/converter.py.

API Reference#

V4 Components (ncore.data.v4):

SequenceComponentGroupsWriter - Main writer for V4 sequences
PosesComponent - Static and dynamic pose storage
IntrinsicsComponent - Camera and lidar intrinsics
LidarSensorComponent - Lidar frame data
CameraSensorComponent - Camera frame data
CuboidsComponent - 3D cuboid track observations
MasksComponent - Camera ego masks

Data Converter (ncore.data_converter):

BaseDataConverter - Abstract base class for all converters
BaseDataConverterConfig - Base configuration dataclass
FileBasedDataConverter - Abstract base class for converters that read from a local root directory (--root-dir)
FileBasedDataConverterConfig - Configuration base for file-based converters (validates --root-dir)

Sensor Models (ncore.data):

FThetaCameraModelParameters - FTheta (equidistant) camera model
OpenCVFisheyeCameraModelParameters - Kannala-Brandt fisheye camera model
OpenCVPinholeCameraModelParameters - Radial/tangential pinhole camera model
RowOffsetStructuredSpinningLidarModelParameters - Spinning lidar model

Footnotes