Executor Compatibility#

MatX’s executor design allows for computations to run on different targets while leaving the code unchanged. This document outlines the compatibility of various functions with these executors, categorized into two types:

Element-wise operations: These operations can be executed on any executor.
Transforms: These invoke library calls (e.g., CUDA libraries or CPU libraries on the host) or use custom kernels.

Note that there can be small differences in results between the Host executor and CUDA executor due to the way floating-point arithmetic is performed. Also, on the host, most functions with the exception of reductions support multithreading.

The following table outlines the compatibility of different transforms with the different executors.

Transform Executor Compatibility Matrix#
Transform	Half Precision	Host	GPU	Notes
fft	GPU only	Yes	Yes
matmul	Yes	Yes	Yes
outer	Yes	Yes	Yes
matvec	Yes	Yes	Yes
chol	No	Yes	Yes
lu	No	Yes	Yes	L & U are returned in the lower and upper half of the output respectively
qr	No	Yes	Yes	Returns householder vectors and scalar factors on host
eig	No	Yes	Yes	Hermitian/symmetric inputs only
svd	No	Yes	Yes	Different methods on GPU for smaller matrices
inv	No	No	Yes
pinv	No	Yes	Yes
det	No	Yes	Yes
trace	No	Yes	Yes
conv/corr	Only for direct method	Limited	Yes	Host only supprts 1D convolution using fft method
hist	No	No	Yes	Single-threaded host
sort	No	Yes	Yes	Single-threaded host
cumsum	No	Limited	Yes	Host only supports 1 and 2D cumsum and is single-threaded