CUDA device call conventions#

Numba CUDA MLIR supports two ABIs for device functions:

The Numba CUDA MLIR ABI, used internally for most compiled device code.
The C ABI, intended for interoperability with CUDA C++ style calls.

ABI overview#

Numba CUDA MLIR ABI#

The Numba CUDA MLIR ABI is described in Device function ABI (without the extern "C" modifier):

The function has a status return code.
The Python return value is passed via a pointer in the first argument.
Function names are mangled using Numba CUDA MLIR’s mangling rules.
Optional returns and exception status can be represented via the status channel.

C ABI#

The C ABI behavior for compiled Python device functions is described in Using the C ABI:

The function has a conventional C-style signature: <return_type>(<args...>).
There is no separate status return code channel.
Function names are predictable (by default the Python __name__), and can be set explicitly with abi_info={"abi_name": ...}.
The C ABI is supported for device functions (not kernels).

Caller/callee matrix#

The table below summarizes what happens at each call edge:

Caller and callee ABI combinations#
Caller / Callee	Numba CUDA MLIR ABI callee	C ABI callee
Numba CUDA MLIR ABI caller	NCM-to-NCM call. Uses Numba CUDA MLIR ABI marshalling (status + return pointer), and propagates lower-frame error status.	Mixed call. Arguments / return are marshalled using the callee’s C ABI signature. No callee status channel exists to propagate Python-exception status from the callee.
C ABI caller	Mixed call. The call is marshalled using the callee’s Numba CUDA MLIR ABI. The Numba CUDA MLIR callee can still produce status, but the C ABI caller has no outward status channel and does not propagate lower-frame status.	C-to-C call. Conventional C-style argument / return passing with no status channel.

What arbitrary nesting means#

Each call site is lowered using the callee’s ABI, not by forcing one ABI for the whole call chain. This allows patterns like:

Numba CUDA MLIR ABI caller -> C ABI callee -> Numba CUDA MLIR ABI callee -> C ABI callee

to compile as expected.

In practice, this means mixed boundaries can appear at any depth in a call graph, including calls to functions declared with numba_cuda_mlir.cuda.declare_device() and calls to Numba-compiled device subroutines.

Behavioral caveats#

The C ABI has no status channel for Python exception propagation.
When a C ABI caller invokes a Numba CUDA MLIR ABI callee returning Optional[T], the optional is flattened to T at the C ABI boundary. A None result is represented as the default-initialized value of T.
Kernels must still use the Numba ABI entry model; compiling kernels with abi="c" is unsupported.
For foreign CUDA C++ functions, use abi="c" with numba_cuda_mlir.cuda.declare_device() and follow pointer-signature guidance in Calling foreign functions from Python kernels.