Argoverse 2 Dataset#

The NCore Argoverse 2 tool converts data from the Argoverse 2 Sensor Dataset into NCore V4 format. The converter reads the Argoverse 2 on-disk Apache Feather files directly with pyarrow and deliberately avoids the heavy av2 devkit (which pulls in torch, kornia, numba, polars and PyAV). Quaternion handling uses scipy (already an ncore dependency), so no extra dependency is introduced.

Conventions#

Argoverse 2 provides data from 9 cameras and 2 lidars; it has no radar. The converter handles all sensor modalities and 3D cuboid annotations.

Camera Sensors#

ring_front_center – 2048x1550 (portrait)

ring_front_left – 1550x2048

ring_front_right – 1550x2048

ring_side_left – 1550x2048

ring_side_right – 1550x2048

ring_rear_left – 1550x2048

ring_rear_right – 1550x2048

stereo_front_left – 1550x2048

stereo_front_right – 1550x2048

The released imagery for all nine cameras is already undistorted – the official av2 devkit projects with the intrinsic matrix K only and does not load the distortion columns – so camera intrinsics are stored using IdealPinholeCameraModelParameters. Because the imagery is already undistorted, global shutter is assumed (ShutterType.GLOBAL). The k1, k2, k3 coefficients present in intrinsics.feather describe the original lens (for re-distorting into the raw frame) and are intentionally not applied to the released images; they are preserved per camera in the camera component generic_meta_data under av2_original_distortion so the original calibration is not lost.

LiDAR Sensors#

up_lidar – Velodyne VLP-32C, 32 beams, 10 Hz

down_lidar – Velodyne VLP-32C, 32 beams, 10 Hz

Argoverse 2 sweeps are egomotion-compensated to the sweep reference timestamp and provided in the egovehicle frame, with real per-point timestamps (offset_ns). The two stacked VLP-32C units are stored separately, each with its own static extrinsic. Points are split per unit by laser_number, mapped into the unit’s own sensor frame, and decompensated using the real per-point timestamps so that NCore stores raw per-point-time ray directions. Because the sensor extrinsic is static, this decompensation is independent of whether the source data applied ego-motion before or after the sensor transform.

A structured VLP-32C model is stored per unit as lidar intrinsics, with per-point model_element (row, column). Argoverse 2 provides no native firing-column index, so the firing pattern is reconstructed from offset_ns (firing columns – one VLP-32C revolution at 10 Hz) and laser_number (the beam, mapped to an elevation-sorted row). The geometry is derived per log from the decompensated reference sweep: elevations, the laser-to-row map, column timing, per-column azimuths, and per-row azimuth offsets (the 32 beams of a firing column span several degrees of azimuth, so the per-row offset is fit empirically). The two stacked units fire in opposite phase, so they spin oppositely in their own frames (one cw, one ccw), which is detected from the data. The column grid is upsampled 4x so per-frame alignment is not column-quantized, and each sweep is re-aligned to the model by a per-frame affine column remap – a constant phase (the spin phase at a given offset_ns drifts ~1 deg between sweeps) plus a linear term (the spin rate drifts slightly within a sweep on some scenes). Steep downward beams that only return at near range (no far data) have their azimuth offset fit from near-range returns. Deriving from the decompensated cloud (not the ego-motion-smeared compensated one) plus these steps gives ~0.03 deg median far-range reconstruction across scenes (validated on 38 val logs / 76 units, all sub-0.08 deg median with no systematic azimuth or elevation bias), on par with native-column sensors. Pass --lidar-model-source none to store raw ray bundles only.

The laser_number to up/down unit split is not documented by Argoverse 2. The two units occupy the two laser-number halves (< 32 and >= 32); the unit label is recovered from extrinsic geometry by per-beam elevation flatness – a laser ring traces a constant-elevation cone only in its own sensor frame, so the wrong extrinsic tilts the cone and inflates the per-ring elevation spread. The decision is made once per log and is stable with a wide (~2-10x) margin.

Annotations#

3D cuboid annotations are native to the egovehicle frame at the sweep reference time. They are stored in the rig frame at that timestamp with no ego pose baked in, so the egovehicle motion stays out of the stored coordinates and remains swappable downstream (a V4 feature); the pose graph places the cuboids using the active ego trajectory. The full 3-DOF box orientation is preserved (the AV2 quaternion is converted to the BBox3 xyz-Euler convention, not reduced to yaw). The track_uuid is used as the track ID.

Coordinate Frames#

The first ego pose’s city_SE3_egovehicle is stored as the static world -> world_global pose, so world_global is the Argoverse 2 city frame. All absolute city coordinates remain recoverable for later alignment with the Argoverse 2 HD map (which the converter does not export).

Usage#

bazel run //tools/data_converter/argoverse2 -- \
    --root-dir /path/to/argoverse2/sensor \
    --output-dir /path/to/output \
    argoverse2-v4 \
    --split val

Convert a single log:

bazel run //tools/data_converter/argoverse2 -- \
    --root-dir /path/to/argoverse2/sensor \
    --output-dir /path/to/output \
    argoverse2-v4 \
    --split val \
    --log-id 02678d04-cc9f-3148-9f95-1ba66347dff9

Testing#

AV2_DIR=/path/to/argoverse2/sensor AV2_SPLIT=val \
    bazel test //tools/data_converter/argoverse2:pytest_converter