Executor Compatibility

Executor Compatibility#

MatX’s executor design allows expressions to run on different targets while leaving user code largely unchanged. This page summarizes which public operators are expected to work with each executor family.

Legend:

  • βœ… Fully Supported.

  • 🟧 Partially supported, or supported with executor-specific limitations described in the notes.

  • ❌ Not supported.

Note that there can be small differences in results between host executors and CUDA executors because floating-point arithmetic is performed by different libraries and devices. HostExecutor covers SingleThreadedHostExecutor, SelectThreadsHostExecutor, and AllThreadsHostExecutor. Host executor support for FFT, BLAS, and solver routines depends on the corresponding CPU backend CMake options. Optional backend dependencies such as CPU BLAS, CPU solver libraries, or MathDx requirements are documented in the notes while still using βœ… when the operator is supported for that executor. CUDAJITExecutor support means the operator can participate in a fused JIT expression; non-JIT CUDA execution through cudaExecutor remains available for the broader CUDA library paths.

Operator Executor Compatibility Matrix#

Operator

HostExecutor

CUDAExecutor

CUDAJITExecutor

Notes

abs

βœ…

βœ…

βœ…

Element-wise expression.

abs2

βœ…

βœ…

βœ…

Element-wise expression.

acos

βœ…

βœ…

βœ…

Element-wise expression.

acosh

βœ…

βœ…

βœ…

Element-wise expression.

all

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads.

allclose

🟧

βœ…

❌

Reduction transform with tolerance comparison; host execution is available but reductions are not generally parallelized across host threads.

alternate

βœ…

βœ…

βœ…

Generator expression.

ambgfun

❌

βœ…

❌

CUDA-only radar transform.

angle

βœ…

βœ…

βœ…

Element-wise expression.

any

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads.

apply

βœ…

βœ…

🟧

User callable must be valid for the selected executor; JIT expressions require device/JIT-compatible callable code.

apply_idx

βœ…

βœ…

🟧

User callable receives indices and must be valid for the selected executor; JIT expressions require device/JIT-compatible callable code.

argmax

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads.

argmin

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads.

argminmax

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads.

argsort

βœ…

βœ…

❌

Sort transform; CUDA path uses device sort support.

as_complex_double

βœ…

βœ…

βœ…

Cast expression.

as_complex_float

βœ…

βœ…

βœ…

Cast expression.

as_double

βœ…

βœ…

βœ…

Cast expression.

as_float

βœ…

βœ…

βœ…

Cast expression.

as_int16

βœ…

βœ…

βœ…

Cast expression.

as_int32

βœ…

βœ…

βœ…

Cast expression.

as_int64

βœ…

βœ…

βœ…

Cast expression.

as_int8

βœ…

βœ…

βœ…

Cast expression.

as_type

βœ…

βœ…

βœ…

Cast expression.

as_uint16

βœ…

βœ…

βœ…

Cast expression.

as_uint32

βœ…

βœ…

βœ…

Cast expression.

as_uint64

βœ…

βœ…

βœ…

Cast expression.

as_uint8

βœ…

βœ…

βœ…

Cast expression.

asin

βœ…

βœ…

βœ…

Element-wise expression.

asinh

βœ…

βœ…

βœ…

Element-wise expression.

atan

βœ…

βœ…

βœ…

Element-wise expression.

atan2

βœ…

βœ…

βœ…

Element-wise expression.

atanh

βœ…

βœ…

βœ…

Element-wise expression.

at

βœ…

βœ…

βœ…

Indexing/view expression.

bartlett

βœ…

βœ…

βœ…

Window generator expression.

blackman

βœ…

βœ…

βœ…

Window generator expression.

cart2sph

βœ…

βœ…

βœ…

Element-wise coordinate conversion expression.

ceil

βœ…

βœ…

βœ…

Element-wise expression.

cgsolve

❌

βœ…

❌

CUDA iterative solver path.

channelize_poly

βœ…

βœ…

❌

Polyphase channelizer; host path directly computes the per-branch FIR and DFT stages.

chirp

βœ…

βœ…

βœ…

Generator expression.

chol

βœ…

βœ…

βœ…

Host support requires the CPU solver backend. CUDAJITExecutor support uses cuSolverDx through MathDx for supported rank 2-4 square float, double, complex-float, and complex-double matrices.

clone

βœ…

βœ…

βœ…

View expression.

concat

βœ…

βœ…

βœ…

View/expression composition.

conj

βœ…

βœ…

βœ…

Element-wise expression.

conv1d

βœ…

βœ…

❌

Direct convolution supports host and CUDA executors. FFT convolution can use the CPU FFT backend when enabled.

conv2d

βœ…

βœ…

❌

Direct convolution supports host and CUDA executors.

copy

βœ…

βœ…

❌

Executor-dispatched assignment/copy transform; CUDAJITExecutor fuses expression evaluation instead of using this transform directly.

corr

βœ…

βœ…

❌

Direct correlation supports host and CUDA executors. FFT correlation can use the CPU FFT backend when enabled.

cos

βœ…

βœ…

βœ…

Element-wise expression.

cosh

βœ…

βœ…

βœ…

Element-wise expression.

cov

❌

βœ…

❌

CUDA-only covariance transform.

cross

βœ…

βœ…

βœ…

Small vector expression.

cumsum

🟧

βœ…

❌

Scan transform; host execution is available but does not generally use multithreaded host executor paths.

dct

❌

βœ…

❌

CUDA-only DCT transform.

det

βœ…

βœ…

βœ…

Built from solver functionality; host support requires the CPU solver backend. JIT support follows cuSolverDx matrix type and shape limits.

diag

βœ…

βœ…

βœ…

Generator/view expression.

downsample

βœ…

βœ…

βœ…

View/reindex expression.

eig

βœ…

βœ…

βœ…

Host support requires the CPU solver backend. CUDAJITExecutor supports the cuSolverDx-backed projection path for supported types and shapes.

einsum

❌

βœ…

🟧

CUDA transform. JIT support is available when the expression lowers to supported fused element-wise or matmul-style work.

erf

βœ…

βœ…

βœ…

Element-wise expression.

exp

βœ…

βœ…

βœ…

Element-wise expression.

expj

βœ…

βœ…

βœ…

Element-wise expression.

eye

βœ…

βœ…

βœ…

Generator expression.

fft

βœ…

βœ…

βœ…

Host support requires the CPU FFT backend. CUDAJITExecutor support uses cuFFTDx through MathDx for supported runtime shapes, precisions, and layouts.

fft2

βœ…

βœ…

βœ…

Host support requires the CPU FFT backend. CUDAJITExecutor support uses cuFFTDx through MathDx for supported 2D runtime shapes, precisions, and layouts.

fftfreq

βœ…

βœ…

βœ…

Generator expression.

fftshift1D

βœ…

βœ…

βœ…

View/reindex expression.

fftshift2D

βœ…

βœ…

βœ…

View/reindex expression.

fill

βœ…

βœ…

βœ…

Generator expression.

filter

❌

βœ…

❌

CUDA-only filter transform.

find

βœ…

βœ…

❌

Search/compaction transform.

find_idx

βœ…

βœ…

❌

Search/compaction transform.

find_peaks

βœ…

βœ…

❌

Search transform.

flattop

βœ…

βœ…

βœ…

Window generator expression.

flatten

βœ…

βœ…

βœ…

View expression.

fliplr

βœ…

βœ…

βœ…

View/reindex expression.

flipud

βœ…

βœ…

βœ…

View/reindex expression.

floor

βœ…

βœ…

βœ…

Element-wise expression.

fmod

βœ…

βœ…

βœ…

Element-wise expression.

frexp

βœ…

βœ…

βœ…

Element-wise mantissa/exponent expression.

frexpc

βœ…

βœ…

βœ…

Element-wise mantissa/exponent expression for complex values.

hamming

βœ…

βœ…

βœ…

Window generator expression.

hanning

βœ…

βœ…

βœ…

Window generator expression.

hermitianT

βœ…

βœ…

βœ…

View/expression composition.

hist

❌

βœ…

❌

CUDA-only histogram transform.

ifft

βœ…

βœ…

βœ…

Host support requires the CPU FFT backend. CUDAJITExecutor support uses cuFFTDx through MathDx for supported runtime shapes, precisions, and layouts.

ifft2

βœ…

βœ…

βœ…

Host support requires the CPU FFT backend. CUDAJITExecutor support uses cuFFTDx through MathDx for supported 2D runtime shapes, precisions, and layouts.

ifftshift1D

βœ…

βœ…

βœ…

View/reindex expression.

ifftshift2D

βœ…

βœ…

βœ…

View/reindex expression.

IF

βœ…

βœ…

βœ…

Conditional expression.

IFELSE

βœ…

βœ…

βœ…

Conditional expression.

imag

βœ…

βœ…

βœ…

Element-wise expression.

index

βœ…

βœ…

βœ…

Index generator expression.

interp1

🟧

βœ…

🟧

Linear interpolation is expression-friendly. Spline interpolation uses CUDA-only transform support.

inv

❌

βœ…

βœ…

CUDA solver transform. CUDAJITExecutor support uses cuSolverDx through MathDx for supported rank 2-4 square float, double, complex-float, and complex-double matrices. MAT_INVERSE_ALGO_POSV is available only on the MathDx JIT path and requires Hermitian positive-definite inputs.

isclose

βœ…

βœ…

βœ…

Element-wise comparison expression.

isinf

βœ…

βœ…

βœ…

Element-wise predicate expression.

isnan

βœ…

βœ…

βœ…

Element-wise predicate expression.

kron

βœ…

βœ…

βœ…

Kronecker product uses matmul-style support where possible; host requires CPU BLAS support for library-backed paths and JIT follows MathDx BLAS limits.

lcollapse

βœ…

βœ…

βœ…

View expression.

legendre

βœ…

βœ…

βœ…

Element-wise polynomial expression.

linspace

βœ…

βœ…

βœ…

Generator expression.

log

βœ…

βœ…

βœ…

Element-wise expression.

log10

βœ…

βœ…

βœ…

Element-wise expression.

log2

βœ…

βœ…

βœ…

Element-wise expression.

logspace

βœ…

βœ…

βœ…

Generator expression.

lu

βœ…

βœ…

βœ…

Host support requires the CPU solver backend. CUDAJITExecutor supports cuSolverDx-backed lazy projections for supported types and shapes.

matmul

βœ…

βœ…

βœ…

Host support requires the CPU BLAS backend and supported floating or complex types. CUDAJITExecutor support uses cuBLASDx through MathDx for supported runtime shapes, precisions, layouts, and block-size intersections.

matrix_norm

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads.

matvec

βœ…

βœ…

βœ…

Host support requires the CPU BLAS backend and supported floating or complex types. CUDAJITExecutor support follows cuBLASDx matmul constraints.

max

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads. Element-wise maximum through binary operators remains JIT-compatible.

mean

🟧

βœ…

❌

Reduction/statistics transform; host execution is available but reductions are not generally parallelized across host threads.

median

βœ…

βœ…

❌

Sort/statistics transform.

meshgrid

βœ…

βœ…

βœ…

Generator/view expression.

min

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads. Element-wise minimum through binary operators remains JIT-compatible.

mvdr

🟧

βœ…

🟧

Built from solver, BLAS, and expression components; accelerated paths inherit backend and JIT restrictions from those components.

normalize

🟧

βœ…

❌

Reduction-based expression; host execution is available but reductions are not generally parallelized across host threads.

ones

βœ…

βœ…

βœ…

Generator expression.

outer

βœ…

βœ…

βœ…

Host support requires the CPU BLAS backend for library-backed paths. CUDAJITExecutor support follows cuBLASDx matmul constraints.

overlap

βœ…

βœ…

βœ…

View expression.

pad

βœ…

βœ…

βœ…

View/expression composition.

percentile

βœ…

βœ…

❌

Sort/statistics transform.

permute

βœ…

βœ…

βœ…

View/reindex expression.

pinv

βœ…

βœ…

❌

SVD-backed solver transform; host support requires the CPU solver backend. cuSolverDx JIT SVD projection is not currently enabled.

polyval

βœ…

βœ…

βœ…

Element-wise polynomial evaluation expression.

pow

βœ…

βœ…

βœ…

Element-wise expression.

prod

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads.

pwelch

❌

βœ…

❌

CUDA-only spectral estimation transform.

qr

❌

βœ…

βœ…

CUDA solver transform. CUDAJITExecutor supports cuSolverDx-backed lazy projections for supported types and shapes.

qr_econ

❌

βœ…

βœ…

CUDA solver transform. CUDAJITExecutor supports cuSolverDx-backed lazy projections for supported types and shapes; the Q projection is limited to non-wide matrices where m >= n.

qr_solver

βœ…

βœ…

βœ…

Host support requires the CPU solver backend. CUDAJITExecutor supports cuSolverDx-backed lazy projections for supported types and shapes.

r2c

βœ…

βœ…

βœ…

Real-to-complex view/cast expression.

random

βœ…

βœ…

❌

Random generator state is not JIT-fused.

randomi

βœ…

βœ…

❌

Random integer generator state is not JIT-fused.

range

βœ…

βœ…

βœ…

Generator expression.

rcollapse

βœ…

βœ…

βœ…

View expression.

real

βœ…

βœ…

βœ…

Element-wise expression.

reduce

❌

βœ…

❌

Generic custom reduction currently uses CUDA reduction support.

remap

βœ…

βœ…

βœ…

View/reindex expression.

repmat

βœ…

βœ…

βœ…

View/expression composition.

resample_poly

βœ…

βœ…

❌

Polyphase resampling transform for host and CUDA executors.

reshape

βœ…

βœ…

βœ…

View expression.

reverse

βœ…

βœ…

βœ…

View/reindex expression.

round

βœ…

βœ…

βœ…

Element-wise expression.

rsqrt

βœ…

βœ…

βœ…

Element-wise expression.

sar_bp

❌

βœ…

❌

CUDA-only SAR backprojection transform.

select

βœ…

βœ…

βœ…

Selection expression.

shift

βœ…

βœ…

βœ…

View/reindex expression.

sign

βœ…

βœ…

βœ…

Element-wise expression.

sin

βœ…

βœ…

βœ…

Element-wise expression.

sincos

βœ…

βœ…

βœ…

Element-wise expression.

sinh

βœ…

βœ…

βœ…

Element-wise expression.

slice

βœ…

βœ…

βœ…

View expression.

softmax

❌

βœ…

❌

CUDA-only reduction-style transform.

solve

βœ…

βœ…

βœ…

Solver transform; host support requires the CPU solver backend. JIT support follows the underlying cuSolverDx factorization limits when it lowers to a supported solver projection.

sort

🟧

βœ…

❌

Sort transform; host execution is available but does not generally use multithreaded host executor paths.

sph2cart

βœ…

βœ…

βœ…

Element-wise coordinate conversion expression.

sqrt

βœ…

βœ…

βœ…

Element-wise expression.

stack

βœ…

βœ…

βœ…

View/expression composition.

stdd

🟧

βœ…

❌

Reduction/statistics transform; host execution is available but reductions are not generally parallelized across host threads.

sum

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads.

svd

βœ…

βœ…

❌

Host support requires the CPU solver backend. CUDAJITExecutor SVD projection is not currently enabled.

svdbpi

❌

βœ…

❌

CUDA-only batched power-iteration SVD transform.

svdpi

❌

βœ…

❌

CUDA-only power-iteration SVD transform.

tan

βœ…

βœ…

βœ…

Element-wise expression.

tanh

βœ…

βœ…

βœ…

Element-wise expression.

toeplitz

βœ…

βœ…

βœ…

Generator/expression composition.

trace

🟧

βœ…

❌

Reduction-style matrix transform; host execution is available but reductions are not generally parallelized across host threads.

transpose

βœ…

βœ…

βœ…

View/reindex expression.

transpose_matrix

βœ…

βœ…

βœ…

View/reindex expression.

unique

βœ…

βœ…

❌

Sort/compaction transform.

unwrap

βœ…

βœ…

βœ…

Element-wise/reindex expression.

upsample

βœ…

βœ…

βœ…

View/reindex expression.

var

🟧

βœ…

❌

Reduction/statistics transform; host execution is available but reductions are not generally parallelized across host threads.

vector_norm

🟧

βœ…

❌

Reduction transform; host execution is available but reductions are not generally parallelized across host threads.

zeros

βœ…

βœ…

βœ…

Generator expression.

zipvec

βœ…

βœ…

βœ…

Vector packing expression.

arithmetic operators

βœ…

βœ…

βœ…

Includes unary minus and binary +, -, *, /, and % element-wise operators.

comparison operators

βœ…

βœ…

βœ…

Includes ==, !=, <, <=, >, and >= element-wise operators.

logical operators

βœ…

βœ…

βœ…

Includes element-wise !, &&, ||, &, |, and ^ operators.