Sensor Models#

NCore supports multiple sensor models for representing camera and lidar intrinsics and distortion parameters. Each model provides different parameterizations suited for various sensor types and use cases.

Camera Models#

The following camera models are supported:

FTheta Camera Model - NVIDIA’s FTheta camera model with polynomial distortion parameterization, suitable for both perspective and wide field-of-view cameras
Ideal Pinhole Camera Model - Efficient distortion-free pinhole camera model, also used as the target for rectification
OpenCV Pinhole Camera Model - Standard pinhole camera model with rational radial, tangential, and thin prism distortion coefficients
OpenCV Fisheye Camera Model - OpenCV’s fisheye camera model with polynomial distortion for ultra-wide angle and fisheye lenses

Camera Model Parameterizations#

Camera models may optionally include external distortion parameters to account for distortion sources outside the camera itself (e.g., windshields). See External Distortion Models for details.

The camera_model_parameters dictionary will unconditionally contain:

resolution - width and height of the image in pixels (uint32, [2,])
shutter_type - shutter type of the camera’s imaging sensor (str, one of [ROLLING_TOP_TO_BOTTOM, ROLLING_LEFT_TO_RIGHT, ROLLING_BOTTOM_TO_TOP, ROLLING_RIGHT_TO_LEFT, GLOBAL])

FTheta Camera Model#

If camera_model_type = 'ftheta' the following intrinsic parameters will additionally be available in camera_model_parameters:

principal_point - u and v coordinate of the principal point, following the NVIDIA default convention for FTheta camera models in which the pixel indices represent the center of the pixel (not the top-left corners). NOTE: principal point coordinates will be adapted internally in camera model APIs to reflect the image coordinate conventions (float32, [2,])
reference_poly - indicating which of the two polynomials is the reference polynomial - the other polynomial is only an approximation of the inverse of the reference polynomial (str, one of [PIXELDIST_TO_ANGLE, ANGLE_TO_PIXELDIST])
pixeldist_to_angle_poly - coefficients of the backward distortion polynomial (conditionally approximate, depending on reference_poly), mapping pixel-distances to angles [rad] (float32, [6,])
angle_to_pixeldist_poly - coefficients of the forward distortion polynomial (conditionally approximate, depending on reference_poly), mapping angles [rad] to pixel-distances (float32, [6,])
max_angle - maximal extrinsic ray angle [rad] with the principal direction (float32)
linear_cde - Coefficients of the constrained linear term \(\begin{bmatrix} c & d \\ e & 1 \end{bmatrix}\) transforming between sensor coordinates (in mm) to image coordinates (in px) (float32, [3,])

Mathematical Model#

The FTheta model projects 3D camera rays to image coordinates through the following steps:

Ray to Angle: Convert camera ray direction [x, y, z] to angle θ:

\[\theta = \arctan2(\sqrt{x^2 + y^2}, z)\]
Angle to Pixel Distance: Apply the forward polynomial f(θ):

\[\delta = f(\theta) = c_0 + c_1\theta + c_2\theta^2 + c_3\theta^3 + c_4\theta^4 + c_5\theta^5\]
Linear Transformation: Apply constrained linear transformation A and add principal point:

\[\begin{split}\begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} c & d \\ e & 1 \end{bmatrix} \frac{\delta}{\sqrt{x^2 + y^2}} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} u_0 \\ v_0 \end{bmatrix}\end{split}\]

The inverse operation uses the backward polynomial (either reference or approximate inverse).

Example: For a typical FTheta model, both forward and backward polynomials contain 6 coefficients each (float32, [6,]).

Ideal Pinhole Camera Model#

If camera_model_type = 'ideal-pinhole' the following intrinsic parameters will additionally be available in camera_model_parameters:

principal_point - u and v coordinate of the principal point, following the image coordinate conventions (float32, [2,])
focal_length - focal lengths in u and v direction, resp., mapping normalized camera coordinates to image coordinates relative to the principal point (float32, [2,])

Mathematical Model#

The ideal pinhole model is a distortion-free perspective projection. It is the most efficient camera model (no distortion polynomial evaluation, no iterative undistortion) and the natural target for rectification.

Normalize: Convert camera ray [x, y, z] to normalized coordinates:

\[x' = \frac{x}{z}, \quad y' = \frac{y}{z}\]
Project to Image: Apply focal length and principal point:

\[\begin{split}\begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} f_u x' + u_0 \\ f_v y' + v_0 \end{bmatrix}\end{split}\]

Unprojection is the closed-form inverse \(x' = (u - u_0) / f_u\), \(y' = (v - v_0) / f_v\), followed by normalization of [x', y', 1].

Note

An OpenCV Pinhole whose distortion coefficients are all zero (and which carries no external distortion) is equivalent to an ideal pinhole and is evaluated through the same closed-form path internally. The ideal-pinhole and opencv-pinhole model types are siblings: an OpenCV pinhole is not an instance of an ideal pinhole.

OpenCV Pinhole Camera Model#

If camera_model_type = 'opencv-pinhole' the following intrinsic parameters will additionally be available in camera_model_parameters:

principal_point - u and v coordinate of the principal point, following the image coordinate conventions (float32, [2,])
focal_length - focal lengths in u and v direction, resp., mapping (distorted) normalized camera coordinates to image coordinates relative to the principal point (float32, [2,])
radial_coeffs - radial distortion coefficients [k1,k2,k3,k4,k5,k6] parameterizing the rational radial distortion factor \(\frac{1 + k_1r^2 + k_2r^4 + k_3r^6}{1 + k_4r^2 + k_5r^4 + k_6r^6}\) for squared norms \(r^2\) of normalized camera coordinates (float32, [6,])
tangential_coeffs - tangential distortion coefficients [p1,p2] parameterizing the tangential distortion components \(\begin{bmatrix} 2p_1x'y' + p_2 \left(r^2 + 2{x'}^2 \right) \\ p_1 \left(r^2 + 2{y'}^2 \right) + 2p_2x'y' \end{bmatrix}\) for normalized camera coordinates \(\begin{bmatrix} x' \\ y' \end{bmatrix}\) (float32, [2,])
thin_prism_coeffs - thins prism distortion coefficients [s1,s2,s3,s4] parameterizing the thin prism distortion components \(\begin{bmatrix} s_1r^2 + s_2r^4 \\ s_3r^2 + s_4r^4 \end{bmatrix}\) for squared norms \(r^2\) of normalized camera coordinates (float32, [4,])

Mathematical Model#

The OpenCV Pinhole model applies radial, tangential, and thin prism distortions:

Normalize: Convert camera ray [x, y, z] to normalized coordinates:

\[x' = \frac{x}{z}, \quad y' = \frac{y}{z}, \quad r^2 = {x'}^2 + {y'}^2\]
Radial Distortion: Apply rational radial distortion factor:

\[d_r = \frac{1 + k_1r^2 + k_2r^4 + k_3r^6}{1 + k_4r^2 + k_5r^4 + k_6r^6}\]
Tangential Distortion:

\[\begin{split}\begin{bmatrix} \Delta x_t \\ \Delta y_t \end{bmatrix} = \begin{bmatrix} 2p_1x'y' + p_2(r^2 + 2{x'}^2) \\ p_1(r^2 + 2{y'}^2) + 2p_2x'y' \end{bmatrix}\end{split}\]
Thin Prism Distortion:

\[\begin{split}\begin{bmatrix} \Delta x_p \\ \Delta y_p \end{bmatrix} = \begin{bmatrix} s_1r^2 + s_2r^4 \\ s_3r^2 + s_4r^4 \end{bmatrix}\end{split}\]
Project to Image: Apply focal length and principal point:

\[\begin{split}\begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} f_u(x'd_r + \Delta x_t + \Delta x_p) + u_0 \\ f_v(y'd_r + \Delta y_t + \Delta y_p) + v_0 \end{bmatrix}\end{split}\]

Example: Typical parameter sizes: radial_coeffs [6,], tangential_coeffs [2,], thin_prism_coeffs [4,], all float32.

OpenCV Fisheye Camera Model#

If camera_model_type = 'opencv-fisheye' the following intrinsic parameters will additionally be available in camera_model_parameters:

principal_point - u and v coordinate of the principal point, following the image coordinate conventions (float32, [2,])
focal_length - focal lengths in u and v direction, resp., mapping (distorted) normalized camera coordinates to image coordinates relative to the principal point (float32, [2,])
radial_coeffs - radial distortion coefficients representing OpenCV-style [k1,k2,k3,k4] parameters of the fisheye distortion polynomial \(\theta(1 + k_1\theta^2 + k_2\theta^4 + k_3\theta^6 + k_4\theta^8)\) for extrinsic camera ray angles \(\theta\) with the principal direction (float32, [4,])
max_angle - maximal extrinsic ray angle [rad] with the principal direction (float32)

Mathematical Model#

The OpenCV Fisheye model uses a fisheye distortion polynomial:

Ray to Angle: Convert camera ray direction [x, y, z] to angle θ and normalized direction:

\[\theta = \arctan2(\sqrt{x^2 + y^2}, z)\]

\[x_n = \frac{x}{\sqrt{x^2 + y^2}}, \quad y_n = \frac{y}{\sqrt{x^2 + y^2}}\]
Fisheye Distortion Polynomial: Apply distortion to angle:

\[\delta(\theta) = \theta(1 + k_1\theta^2 + k_2\theta^4 + k_3\theta^6 + k_4\theta^8)\]
Project to Image: Scale by focal length and add principal point:

\[\begin{split}\begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} f_u \delta(\theta) x_n + u_0 \\ f_v \delta(\theta) y_n + v_0 \end{bmatrix}\end{split}\]

The inverse operation uses Newton-Raphson iteration to invert the ninth-degree polynomial. Example: The radial_coeffs parameter contains [k₁, k₂, k₃, k₄] (float32, [4,]).

External Distortion Models#

Camera models can optionally include external distortion sources that affect rays before they reach the camera lens (e.g., windshields). If present, the camera_model_parameters dictionary will contain:

external_distortion_parameters - parameters for the external distortion model (optional, may be None)

When external_distortion_parameters is present, it will contain:

external_distortion_type - type of external distortion (str)

Bivariate Windshield Model#

If external_distortion_type = 'bivariate-windshield', this models optical distortion caused by a vehicle’s windshield. This model is only applicable when the camera’s entire field of view projects through the windshield.

Mathematical Model#

The distortion operates on spherical angle representations of sensor rays with direction [x, y, z]:

Convert to Spherical Angles:

\[\phi = \arcsin\left(\frac{x}{\sqrt{x^2 + y^2 + z^2}}\right) \quad \text{(horizontal angle)}\]

\[\theta = \arcsin\left(\frac{y}{\sqrt{x^2 + y^2 + z^2}}\right) \quad \text{(vertical angle)}\]
Apply Bivariate Polynomial Distortion:

Distortion is applied via separate polynomials for horizontal and vertical components:

\[\phi' = P_h(\phi, \theta)\]

\[\theta' = P_v(\phi, \theta)\]

Where each polynomial has the bivariate form (example for order N=2):

\[P(\phi, \theta) = c_0 + c_1\phi + c_2\phi^2 + (c_3 + c_4\phi)\theta + c_5\theta^2\]

General form for order N:

\[P(\phi, \theta) = \sum_{j=0}^{N} \left(\sum_{i=0}^{N-j} c_{i,j}\phi^i\right)\theta^j\]
Reconstruct Distorted Ray:

\[[x', y', z'] = \left[\sin(\phi'), \sin(\theta'), \sqrt{1 - \sin^2(\phi') - \sin^2(\theta')}\right]\]

Parameters#

The following parameters will be available in external_distortion_parameters:

reference_poly - indicates which polynomial is the reference (str, one of [FORWARD, BACKWARD]). The reference polynomial is exact; the other is an approximate inverse computed during calibration.
horizontal_poly - coefficients for forward horizontal distortion polynomial P_h (float32, [(N+1)·(N+2)/2,])
vertical_poly - coefficients for forward vertical distortion polynomial P_v (float32, [(M+1)·(M+2)/2,])
horizontal_poly_inverse - coefficients for backward horizontal distortion (float32, [(N+1)·(N+2)/2,])
vertical_poly_inverse - coefficients for backward vertical distortion (float32, [(M+1)·(M+2)/2,])

Example: For a 2nd-order bivariate polynomial (N=2), the coefficient arrays contain 6 elements: [c₀, c₁, c₂, c₃, c₄, c₅]. For a 3rd-order polynomial (N=3), they contain 10 elements. The number of coefficients for order N is given by: (N+1)·(N+2)/2

Direction#

Forward distortion (distort_camera_rays): transforms rays from external (world) to internal (camera sensor)
Backward distortion (undistort_camera_rays): transforms rays from internal (camera sensor) to external (world)

Rectification#

Rectification maps imagery from a source camera into a target camera’s image domain. It is a generic “from-camera -> to-camera” intrinsic remap within a shared camera frame; the most common use is mapping a distorted source camera to a distortion-free ideal pinhole target, but the target may itself be distorted.

Deriving an ideal target#

The IdealPinholeCameraModelParameters.from_source(source, target_fov=None) factory builds an ideal pinhole approximating any supported source camera. It extracts the source’s paraxial pinhole geometry (focal length, principal point, resolution), preserving the source focal aspect ratio:

Ideal / OpenCV pinhole / OpenCV fisheye: the model’s own focal_length (anisotropic fx != fy preserved; OpenCV pinhole distortion is dropped).
FTheta: the first-order coefficient of the angles-to-pixeldistances (forward) polynomial scaled by the linear term, i.e. [c1 * c, c1] with \(c_1 = \mathrm{angle\_to\_pixeldist\_poly}[1]\) and \(c = \mathrm{linear\_cde}[0]\), since \(\mathrm{d}\,\delta/\mathrm{d}\theta|_{\theta=0} = c_1\) and \(r = f\tan\theta \approx f\theta\) near \(\theta = 0\) (the linear term’s shear components are not representable by a pinhole and are dropped).

The target_fov argument selects the angular extent of the rectified pinhole as a full field-of-view angle [rad]:

None: use the source’s natural field of view (the paraxial focal). A pinhole focal always yields a field of view strictly below 180 degrees, so for wide fisheye / omnidirectional cameras this is a narrow rectilinear central window of the captured scene.
float: an isotropic target full field of view, preserving the source focal aspect ratio (the most binding image axis lands exactly at this value).
np.ndarray of shape [2,]: per-axis target full field of view [fov_x, fov_y].

target_fov selects which rays the pinhole covers, not a magnification: because a pinhole maps angle to pixel distance as \(r = f\tan\theta\), different values yield genuinely different views (the periphery stretches increasingly for wider fields of view; the optical axis stays fixed). Widening past the source’s captured field of view is allowed and yields invalid (out-of-source) regions, which the Rectificator zeroes. from_source() raises ValueError if a requested field of view cannot be represented by a pinhole (any axis at or beyond 180 degrees), and TypeError for an unsupported source model.

Use IdealPinholeCameraModelParameters.natural_fov(source) (per-axis full FOV np.ndarray) to discover a sensible value to scale, e.g.:

import numpy as np
from ncore.data import IdealPinholeCameraModelParameters
from ncore.sensors import CameraModel, Rectificator

source_params = source_model.get_parameters()

# Default rectification (source's natural field of view)
target_params = IdealPinholeCameraModelParameters.from_source(source_params)

# Widen by 20% (e.g. to recover more of a wide lens) or set an explicit FOV
target_params = IdealPinholeCameraModelParameters.from_source(
    source_params, target_fov=1.2 * IdealPinholeCameraModelParameters.natural_fov(source_params)
)
target_params = IdealPinholeCameraModelParameters.from_source(source_params, target_fov=np.radians(90.0))

target = CameraModel.from_parameters(target_params)
rect = Rectificator(source_model, target)

Rectificator#

The Rectificator is constructed from a source/target CameraModel pair and builds a backward sample map (target -> source) once:

each target pixel is unprojected to a ray with the target model,
the ray is re-projected to a source image coordinate with the source model.

It exposes:

sample_map - [H, W, 2] source image coordinates to sample for each target pixel (usable directly with custom / GPU / OpenCV resampling).
valid_mask - [H, W] boolean mask of target pixels whose ray projects inside the source image.
apply(image, mode="bilinear", padding_mode="zeros") - resamples a source image ([H, W], [H, W, C] or [N, H, W, C], channels-last) into the target domain, zeroing invalid pixels. mode selects the interpolation ("bilinear", "nearest" or "bicubic") and padding_mode the out-of-bounds sampling behaviour ("zeros", "border" or "reflection"); both are forwarded to torch.nn.functional.grid_sample.
source_points_to_target(points) / target_points_to_source(points) - map sparse continuous image points between the two cameras.

Lidar Models#

NCore supports the following lidar models:

Row-Offset Structured Spinning Lidar - Structured spinning lidar model with per-row azimuth offsets (compatible with sensors like Hesai P128)

Lidar Model Parameterizations#

The lidar_model_parameters dictionary contains model-specific parameters for lidar sensors. Currently, the supported model type is row-offset-spinning.

Row-Offset Structured Spinning Lidar Model#

If lidar_model_type = 'row-offset-spinning', this models structured spinning lidar sensors with per-row azimuth offsets (compatible with sensors like Hesai P128).

Mathematical Model#

The Row-Offset Structured Spinning Lidar model parameterizes lidar returns using per-row elevation angles and per-column azimuth angles with row-specific offsets.

Notation#

α (alpha): elevation angle in radians
β (beta): azimuth angle in radians
E: row_elevations_rad array - elevation angles per row
B: column_azimuths_rad array - base azimuth angles per column
Δ (Delta): row_azimuth_offsets_rad array - azimuth offsets per row (optional, can be zero if no row offsets)
i: row index
j: column index

Element to Angles: For element (i, j):

\[\alpha = E[i]\]

\[\beta = B[j] + \Delta[i]\]
Angles to Sensor Ray: Convert spherical angles to 3D ray direction:

\[x = \cos(\beta) \cdot \cos(\alpha)\]

\[y = \sin(\beta) \cdot \cos(\alpha)\]

\[z = \sin(\alpha)\]
Sensor Ray to Angles: Convert 3D ray [x, y, z] to angles:

\[\alpha = \arcsin(z)\]

\[\beta = \arctan2(y, x)\]

Parameters#

The following parameters will be available in lidar_model_parameters:

row_elevations_rad - elevation angle for each row (float32, [n_rows,])
column_azimuths_rad - base azimuth angle for each column (float32, [n_columns,])
row_azimuth_offsets_rad - azimuth offset for each row (float32, [n_rows,])
spinning_frequency_hz - rotation frequency in Hz (float32)
spinning_direction - rotation direction (str, one of [‘cw’, ‘ccw’])
n_rows - number of vertical laser channels (int)
n_columns - number of azimuth divisions per revolution (int)

Example: A Hesai P128 lidar would have n_rows=128 (for 128 laser channels), with row_elevations_rad containing 128 elevation angles and row_azimuth_offsets_rad containing 128 azimuth offsets.