sar_bp#

Synthetic aperture radar backprojection. The sar_bp operator is currently in the experimental namespace as its API is subject to change.

Added in version head.

template<typename ImageType, typename RangeProfilesType, typename PlatPosType, typename VoxLocType, typename RangeToMcpType>
inline auto matx::experimental::sar_bp(const ImageType &initial_image, const RangeProfilesType &range_profiles, const PlatPosType &platform_positions, const VoxLocType &voxel_locations, const RangeToMcpType &range_to_mcp, const SarBpParams &params)#

sar_bp implements standard or direct SAR backprojection. Backprojection is an image formation algorithm that reconstructs a complex-valued image from a set of range-compressed complex samples. The runtime complexity of the sar_bp transform is O(MNP) where there are MxN pixels and P pulses. In the case that M=N=P, sar_bp is O(N^3).

Note

The number of range bins is constrained as follows:

  • For the Float, Mixed, FloatFloat, and TaylorFast compute types, num_range_bins is capped at 2^24 (16,777,216). Those paths compute the per-pulse bin offset and (for Float / FloatFloat / TaylorFast) the bin floor in fp32, and fp32 can exactly represent all integers only in [-2^24, 2^24]. Exceeding 2^24 would result in loss of precision in the bin index used for range profile sampling.

  • The Double compute type uses fp64 throughout for the bin computation and is only constrained by the int32_t tensor-indexing limit (num_range_bins <= cuda::std::numeric_limits<int32_t>::max()). The transform throws matxInvalidParameter at launch if these limits are exceeded. Typical raw num_range_bins is on the order of 10^4-10^5 and well below the 2^24 bound. The fp32 limit can be reached, however, when heavy range oversampling is applied. An upsample factor that pushes the FFT length above ~2^24 / num_samples_raw (e.g., upsampling 32k raw samples by 512x or more) for non-Double compute types will trigger the runtime check.

Template Parameters:
  • ImageType – Type of initial_image and output image. ImageType must represent a 2D operator of size image_height x image_width for an image of the corresponding dimensions. ImageType must be a complex type. Typical data types are cuda::std::complex<float> or cuda::std::complex<double>.

  • RangeProfilesType – Type of range_profiles. RangeProfilesType must represent a 2D operator of size num_pulses x num_range_bins containing the range-compressed complex samples. RangeProfilesType must be a complex type. Typical data types are cuda::std::complex<float> or cuda::std::complex<double>.

  • PlatPosType – Type of platform positions. PlatPosType must represent a 1D operator of size num_pulses containing the platform positions. Currently, the only supported data types for PlatPosType are double3, double4, float3, and float4. For the float4 and double4 data types, the w coordinate is ignored. If the user has three separate operators for the x, y, and z coordinates, they can be combined using the zipvec operator.

  • VoxLocType – Type of voxel locations. VoxLocType must represent a 2D operator of size image_height x image_width containing the voxel locations. Currently, the only supported data types for VoxLocType are double3, double4, float3, and float4. For the float4 and double4 data types, the w coordinate is ignored. If the user has three separate operators for the x, y, and z coordinates, they can be combined using the zipvec operator.

  • RangeToMcpType – Type of range to motion compensation point. RangeToMcpType must represent a 0D or 1D real-valued operator of size 1 or num_pulses, respectively.

Parameters:
  • initial_image – Initial image. Initial image must represent a 2D operator of size image_height x image_width for an image of the corresponding dimensions. Contributions computed during backprojection will be added to the initial image. The user can use the zeros generator (i.e., matx::zeros) if no initial image is needed. See ImageType documentation for details on supported rank and data types.

  • range_profiles – Range profiles. Range profiles must represent a 2D operator of size num_pulses x num_range_bins containing the range-compressed complex samples. See RangeProfilesType documentation for details on supported rank and data types.

  • platform_positions – Platform positions represent the x, y, and z coordinates of the aperture phase center for each pulse. The coordinates should be in the same coordinate system and units as the voxel locations. See PlatPosType documentation for details on supported rank and data types.

  • voxel_locations – Voxel locations represent the x, y, and z coordinates of the voxels in the image. The coordinates should be in the same coordinate system and units as the platform positions. See VoxLocType documentation for details on supported rank and data types.

  • range_to_mcp – Range to motion compensation point is the distance (range) from each platform position to the motion compensation point. See RangeToMcpType documentation for details on supported rank and data types.

  • params – SAR backprojection parameters. See SarBpParams documentation for details on supported parameters. Note that the TaylorFast compute type does not support geometries where the antenna phase center is located at any of the image pixels (i.e., at range 0 from some pixel). In such cases, the user should choose another compute type to avoid potential divide-by-zero errors at run-time.

enum class matx::SarBpComputeType#

Floating point compute type for the SAR BP kernel.

The compute type controls the floating point precision of intermediate calculations in the SAR BP kernel. While the inputs (range profiles, antenna positions, etc.) and output (image) have their own data types, we may wish to use a different precision for the internal calculations. For example, the output may be cuda::std::complex<float> while the intermediate calculations are done in double. For some compute types, additional approximations will be applied to further improve run-time performance.

Values:

enumerator Double#

Uses double precision for all intermediate calculations. The backprojection kernel works with user-provided input for the range profiles, platform positions, ranges to mocomp point, and voxel locations. Thus, the user should provide values in double precision if needed. Typically, the platform positions and ranges to mocomp point will be double precision and range profiles and voxel locations will be single precision, but users can provide double precision range profiles and voxel locations if needed.

enumerator Mixed#

Uses mixed precision for intermediate calculations. This compute type offers a trade-off between performance and precision. With Mixed precision, the range calculated per pixel-pulse pair will still typically be done in double-precision, but interpolation and accumulation will be single-precision. When combined with PhaseLUTOptimization, the sine/cosine calculations will no longer be double-precision either.

enumerator FloatFloat#

The FloatFloat compute type combines mixed precision (i.e., fp32 when possible) with a float-float handling of the values for which fp64 would otherwise be needed. The float-float representation offers increased precision relative to fp32, but not full fp64 precision, through the use of increased fp32 computation and representing each value as an unevaluated sum of two fp32 components. For optimal performance, users can provide any inputs that require additional precision (e.g., platform positions and range to mocomp point) directly in fltflt format. Otherwise, the kernel will perform double-to-fltflt conversions internally. FloatFloat requires PhaseLUTOptimization.

enumerator TaylorFast#

Uses a local Taylor approximation of the pulse-to-pixel range about a centered per-thread-block reference point. This mode prioritizes throughput and requires PhaseLUTOptimization. The PropSarBpTaylorFastAddThirdOrder property can be used to add a third-order term to the Taylor approximation. On systems with reduced double-precision throughput, TaylorFast is typically the fastest compute type using either second order or third order Taylor approximation. TaylorFast will divide by reference range values to certain image pixels and thus does not support geometries where the antenna phase center is located at any of the image pixels (i.e., at range 0 from some pixel). In such cases, the user should choose a different compute type.

enumerator Float#

Uses single precision for all intermediate calculations. This mode is fast, but typically does not provide sufficient precision for cases where the ranges exceed several kilometers. It is not suited for spaceborne SAR geometries.

enum class matx::SarBpFeature : uint32_t#

Features that can be enabled or disabled for the SAR BP kernel.

Values:

enumerator None#

No features enabled.

enumerator PhaseLUTOptimization#

Enable the phase LUT optimization. This feature uses a precomputed lookup table to store partial values for the reference phases used during backprojection. The value from the lookup table will be combined with an incremental phase calculation within a single range bin that is computed using the lower-precision intrinsic sine/cosine functions. This optimization will utilize a small amount of device memory as a workspace buffer. This optimization is typically useful for the Mixed, FloatFloat, and TaylorFast compute types. It is required for FloatFloat and TaylorFast.

enum class matx::SarBpPixelZMode#

Compile-time assumption about the pixel z (height) coordinates.

Many SAR imaging geometries place every output pixel on a single horizontal focal plane, so the pixel z coordinate is either identically 0 or some other value that is the same for all pixels. When the caller asserts one of these cases (via PropSarBpPixelZIsZero / PropSarBpPixelZIsFixed), the kernel can specialize the per-pulse range computation. This is a compile-time assumption: the kernel does NOT validate it at run time, so the caller is responsible for only setting it when it actually holds.

Values:

enumerator Variable#

No assumption: each pixel may have a distinct z coordinate (default).

enumerator Zero#

Every pixel z coordinate is 0. The z difference term collapses to the antenna z, dropping the pixel-z load/subtraction.

enumerator Fixed#

Every pixel z coordinate is identical (to an arbitrary value). The per-pulse z contribution is uniform across pixels and can be hoisted out of the per-pixel inner loop on the shared-cache compute paths.

struct PropSarBpTaylorFastAddThirdOrder#

Adds the third-order local range term to SarBpComputeType::TaylorFast.

TaylorFast uses a second-order local Taylor range approximation by default. This property instantiates a TaylorFast kernel variant that also evaluates the third-order term. It has no effect unless SarBpParams::compute_type is SarBpComputeType::TaylorFast. This is primarily useful for short stand-off ranges where the second-order approximation may not be accurate enough.

struct PropSarBpPixelZIsZero#

Asserts that every output pixel has a z (height) coordinate of 0.

Selects SarBpPixelZMode::Zero, which lets the kernel drop the pixel-z term from the per-pulse range computation. This is a compile-time assumption that is NOT validated at run time; it must only be set when it actually holds for the supplied voxel locations. If PropSarBpPixelZIsFixed is also set, this property (Zero, the stronger assumption) takes precedence and the Fixed property has no effect.

struct PropSarBpPixelZIsFixed#

Asserts that every output pixel shares one (arbitrary) z coordinate.

Selects SarBpPixelZMode::Fixed. On the shared-cache compute paths the per-pulse z contribution is uniform across pixels and can be hoisted out of the per-pixel inner loop. This is a compile-time assumption that is NOT validated at run time. PropSarBpPixelZIsZero (z == 0) is a stronger special case and takes precedence if both are set.

struct SarBpParams#

Parameters used for SAR backprojection.

Public Members

SarBpComputeType compute_type = {SarBpComputeType::Double}#

The floating point compute type (precision) of the kernel.

SarBpFeature features = {SarBpFeature::None}#

The features to enable or disable in the kernel.

double center_frequency = {0.0}#

The center frequency of the radar in Hz.

double del_r = {0.0}#

The range per compressed range bin. The units should match those of the other locations and distances provided to the backprojector.

TaylorFast Range Approximation#

SarBpComputeType::TaylorFast approximates the range from each pulse to each pixel by expanding the range about a reference pixel for the current thread block. The goal is to replace most per-pixel range work with arithmetic in a local coordinate system, while computing the expensive reference quantities once per pulse and per thread block. By default, TaylorFast keeps terms through second order in the local pixel offset. Users can add the third-order term with the PropSarBpTaylorFastAddThirdOrder property. The third-order term is most useful for short stand-off ranges where the second-order approximation may not provide sufficient accuracy.

The derivation of the Taylor approximation follows. For pulse \(p\), let \(a_p\) be the platform position and let \(m_p\) be the range to the motion compensation point. For image pixel position \(x\), the range quantity used by backprojection is

\[\Delta R_p(x) = \|x - a_p\| - m_p.\]

For a thread block, choose a reference pixel \(x_0\) near the center of the block. Define

\[r_0 = x_0 - a_p, \qquad R_0 = \|r_0\|, \qquad u = \frac{r_0}{R_0},\]

where \(u\) is the unit vector pointing from the platform position to the reference pixel. Hereafter, we generally drop the \(p\) subscript for brevity. For any pixel in the same thread block, write its local offset as

\[d = x - x_0.\]

Then the exact range to the pixel is

\[R(d) = \|r_0 + d\| = \|(x_0 - a_p) + (x - x_0)\| = \|x - a_p\|.\]

Decompose \(d\) into the component along \(u\) and the component perpendicular to \(u\):

\[s = u \cdot d.\]

Here, \(s\) is the signed scalar projection of \(d\) along \(u\) and \(u\) is the along-range direction. The perpendicular, or cross-range, vector is then \(d - s u\). The squared norm of this perpendicular component is:

\[\begin{split}\begin{aligned} \|d - s u\|^2 &= d \cdot d - 2s(u \cdot d) + s^2(u \cdot u) \\ &= \|d\|^2 - 2s(u \cdot d) + s^2 \\ &= \|d\|^2 - 2s^2 + s^2 \\ &= \|d\|^2 - s^2 \end{aligned}\end{split}\]

We denote this squared norm term as \(q\):

\[q = \|d\|^2 - s^2.\]

Mathematically, \(q\) is the squared perpendicular distance from the local pixel to the pulse-to-reference-pixel line. With these definitions, recall that \(R(d) = \|r_0 + d\|\). Thus,

\[\begin{split}\begin{aligned} R(d)^2 &= \|r_0 + d\|^2 \\ &= \|R_0 u + d\|^2 = (R_0 u + d) \cdot (R_0 u + d) \\ &= R_0^2 (u \cdot u) + 2 R_0 (u \cdot d) + d \cdot d \\ &= R_0^2 + 2 R_0 s + \|d\|^2 \\ &= R_0^2 + 2 R_0 s + s^2 + q \\ &= (R_0 + s)^2 + q \end{aligned}\end{split}\]

where we used \(q = \|d\|^2 - s^2\) and thus \(\|d\|^2 = s^2 + q\).

Finally,

\[R(d) = \sqrt{(R_0 + s)^2 + q}.\]

\(R(d)\) is the exact range to the pixel \(x\) and the range difference relative to the block reference is

\[\Delta R(d) = R(d) - R_0.\]

We started this section with the differential range from pixel \(x\) to the motion compensation range \(m_p\). We can now write this as:

\[\begin{split}\begin{aligned} \Delta R_p(x) &= \|x - a_p\| - m_p \\ &= R_p(x) - m_p \\ &= (R_0 + \Delta R(d)) - m_p \\ &= \Delta R(d) + (R_0 - m_p) \end{aligned}\end{split}\]

The bin coordinate for pixel \(x\) is typically:

\[b(x) = \frac{\Delta R_p(x)}{\Delta r} + b_{\mathrm{offset}}\]

Here \(\Delta r\) is the range-bin spacing and \(b_{\mathrm{offset}}\) is the centered range-bin offset used by the SAR backprojection operator. We can now reformulate this as:

\[\begin{split}\begin{aligned} b(x) &= \frac{\Delta R_p(x)}{\Delta r} + b_{\mathrm{offset}} \\ &= \frac{\Delta R(d) + (R_0 - m_p)}{\Delta r} + b_{\mathrm{offset}} \\ &= \frac{\Delta R(d)}{\Delta r} + \frac{R_0 - m_p}{\Delta r} + b_{\mathrm{offset}} \end{aligned}\end{split}\]

We can precompute the right-hand terms as:

\[b_0 = \frac{R_0 - m_p}{\Delta r} + b_{\mathrm{offset}}\]

and thus:

\[b(x) = b_0 + \frac{\Delta R(d)}{\Delta r}\]

The following sections derive an approximation of \(\Delta R(d)\) using a Taylor series expansion.

Derivation of the Taylor Expansion#

Starting from

\[R(d) = \sqrt{R_0^2 + 2 R_0 s + \|d\|^2},\]

factor out \(R_0\):

\[R(d) = R_0 \sqrt{1 + \frac{2s}{R_0} + \frac{\|d\|^2}{R_0^2}}.\]

Let

\[y = \frac{2s}{R_0} + \frac{\|d\|^2}{R_0^2}.\]

The scalar Taylor series is applied to the one-variable function \(f(y) = \sqrt{1 + y}\) about \(y = 0\):

\[f(y) = \sqrt{1 + y} = 1 + \frac{y}{2} - \frac{y^2}{8} + \frac{y^3}{16} + O(y^4),\]

so

\[R(d) = R_0 \left( 1 + \frac{y}{2} - \frac{y^2}{8} + \frac{y^3}{16} \right) + O(R_0 y^4).\]

The order used by TaylorFast is not the number of scalar Taylor-series terms retained in \(f(y)\). It is the order in the local pixel offset \(d\). Recall that \(s = u \cdot d\), so \(s\) is linear in \(d\).

The following table shows how the scalar Taylor-series terms contribute to the local-offset orders used by the TaylorFast approximation.

Scalar term

Contribution

Local-offset order

\(R_0\)

\(R_0\)

Zeroth order. This is the reference range and cancels in \(\Delta R(d) = R(d) - R_0\).

\(R_0 y / 2\)

\(s + \dfrac{s^2 + q}{2R_0}\)

First-order term \(s\); second-order term \((s^2 + q)/(2R_0)\).

\(-R_0 y^2 / 8\)

\(-\dfrac{s^2}{2R_0} - \dfrac{s(s^2 + q)}{2R_0^2} + O(d^4)\)

Second-order term \(-s^2/(2R_0)\); third-order term \(-s(s^2 + q)/(2R_0^2)\).

\(R_0 y^3 / 16\)

\(\dfrac{s^3}{2R_0^2} + O(d^4)\)

Third-order term \(s^3/(2R_0^2)\).

The first-order local-offset term is therefore

\[s.\]

The second-order local-offset terms are

\[R_0 \left( \frac{s^2 + q}{2 R_0^2} - \frac{4s^2}{8 R_0^2} \right) = \frac{q}{2 R_0}.\]

The third-order local-offset terms are

\[R_0 \left( -\frac{4s(s^2 + q)}{8 R_0^3} + \frac{8s^3}{16 R_0^3} \right) = -\frac{s q}{2 R_0^2}.\]

Combining terms gives

\[R(d) = R_0 + s + \frac{q}{2 R_0} - \frac{s q}{2 R_0^2} + O\left(\frac{\|d\|^4}{R_0^3}\right).\]

Equivalently, the local range delta is

\[\Delta R(d) = s + \frac{q}{2 R_0} - \frac{s q}{2 R_0^2} + O\left(\frac{\|d\|^4}{R_0^3}\right).\]

Second-Order Approximation#

The second-order approximation keeps the linear along-range term and the quadratic cross-range correction:

\[\Delta R(d) = s + \frac{q}{2 R_0}.\]

We then compute the bin coordinate as

\[b(d) \approx b_0 + \frac{1}{\Delta r} \left(s + \frac{q}{2R_0}\right).\]

The leading omitted term is

\[\epsilon(d) \approx -\frac{s q}{2 R_0^2}.\]

This is the default TaylorFast implementation. It is most accurate when the pixel block is small relative to the platform stand-off range and when the along-range offset \(s\) remains small.

Third-Order Approximation#

The PropSarBpTaylorFastAddThirdOrder property instantiates a TaylorFast kernel variant that also keeps the cubic term:

\[\Delta R(d) = s + \frac{q}{2 R_0} - \frac{s q}{2 R_0^2}.\]

The corresponding bin coordinate is

\[b(d) \approx b_0 + \frac{1}{\Delta r} \left( s + \frac{q}{2R_0} - \frac{s q}{2R_0^2} \right).\]

With the third-order term included, the leading omitted fourth-order terms are

\[\epsilon(d) \approx \frac{s^2 q}{2 R_0^3} - \frac{q^2}{8 R_0^3}.\]

Compared to the second-order form, this requires additional per-pixel arithmetic and will thus have correspondingly lower throughput than the second-order form. Keeping the third-order term avoids the short-range accuracy loss that occurs when the image block is large relative to \(R_0\) or when the platform is close enough that the \(s q / R_0^2\) term is no longer negligible.

Accuracy Considerations#

The approximation is local to a thread block. Its accuracy depends on the ratio between the maximum local pixel offset and \(R_0\). For satellite-scale data, \(R_0\) is typically very large compared to the dimensions of a CUDA thread block, so the second-order approximation is likely sufficient. It is an assumption that adjacent pixels are spatially near one another and thus a compact pixel tile has a relatively small spatial extent.

For shorter stand-off ranges, larger image tiles, or geometries where the look direction varies rapidly across a thread block, the third-order term becomes more important. The third-order property uses a separate kernel instantiation, which avoids a run-time order dispatch inside the backprojection kernel.

Pixel Height (Z) Assumptions#

SAR images are frequently formed on a flat focal plane, so every output pixel shares the same height (z) coordinate – very often z = 0 (a ground plane at the reference height). When that holds, the per-pulse, per-pixel range computation carries a redundant pixel-z term that can be elided. MatX exposes this as two opt-in, compile-time properties:

  • PropSarBpPixelZIsZero asserts that every pixel has z = 0. The kernel drops the pixel-z term from the per-pulse range computation.

  • PropSarBpPixelZIsFixed asserts that every pixel shares one (arbitrary, possibly non-zero) z value. On the shared-cache compute paths the uniform per-pulse z contribution is hoisted out of the per-pixel inner loop.

Both are compile-time assumptions that are not validated at run time; set one only when it actually holds for the supplied voxel_locations. With neither property set (the default), each pixel may have a distinct z coordinate, which is the correct choice for terrain- or DEM-based focusing. If both are set, PropSarBpPixelZIsZero (the stronger assumption) takes precedence. Callers who set neither are unaffected: the default kernel is unchanged, and only the variants you opt into are additionally instantiated.

Whether z is literally 0/constant depends on the coordinate frame of voxel_locations. A flat focal plane expressed in a scene-local East-North-Up frame (height along z) satisfies these assumptions, whereas a grid expressed in ECEF generally does not, since even a flat plane then has all three coordinates varying per pixel.

Examples#

bp_params.compute_type = matx::SarBpComputeType::Mixed;
(image = matx::experimental::sar_bp(zero_image, range_profiles, platform_positions, voxel_locations, range_to_mcp, bp_params)).run(this->exec);

The PropSarBpTaylorFastAddThirdOrder property is a compile-time MatX operator property. The compute type is selected separately through SarBpParams; the property only adds the third-order term to a TaylorFast launch:

params.compute_type = SarBpComputeType::TaylorFast;
params.features = SarBpFeature::PhaseLUTOptimization;

(image = matx::experimental::sar_bp(zero_image, range_profiles, platform_positions, voxel_locations, range_to_mcp, params)
  .props<PropSarBpTaylorFastAddThirdOrder>()).run(this->exec);

The property only affects SarBpComputeType::TaylorFast. Other compute types continue to use their ordinary kernel instantiations.

When the image is formed on the z = 0 plane, the PropSarBpPixelZIsZero property lets the kernel skip the pixel-z term in the per-pulse range computation. Like PropSarBpTaylorFastAddThirdOrder, it is a compile-time property applied to the operator, independent of the compute type selected through SarBpParams:

params.compute_type = SarBpComputeType::Mixed;
params.features = SarBpFeature::PhaseLUTOptimization;

(image = matx::experimental::sar_bp(zero_image, range_profiles, platform_positions, voxel_locations, range_to_mcp, params)
  .props<PropSarBpPixelZIsZero>()).run(this->exec);

PropSarBpPixelZIsFixed is the analogous property for a constant, non-zero focal-plane height. These assumptions are not validated at run time, so set them only when the supplied voxel z coordinates actually satisfy them.