CUDA device call conventions#

Numba CUDA MLIR supports two ABIs for device functions:

  • The Numba CUDA MLIR ABI, used internally for most compiled device code.

  • The C ABI, intended for interoperability with CUDA C++ style calls.

ABI overview#

Numba CUDA MLIR ABI#

The Numba CUDA MLIR ABI is described in Device function ABI (without the extern "C" modifier):

  • The function has a status return code.

  • The Python return value is passed via a pointer in the first argument.

  • Function names are mangled using Numba CUDA MLIR’s mangling rules.

  • Optional returns and exception status can be represented via the status channel.

C ABI#

The C ABI behavior for compiled Python device functions is described in Using the C ABI:

  • The function has a conventional C-style signature: <return_type>(<args...>).

  • There is no separate status return code channel.

  • Function names are predictable (by default the Python __name__), and can be set explicitly with abi_info={"abi_name": ...}.

  • The C ABI is supported for device functions (not kernels).

Caller/callee matrix#

The table below summarizes what happens at each call edge:

Caller and callee ABI combinations#

Caller / Callee

Numba CUDA MLIR ABI callee

C ABI callee

Numba CUDA MLIR ABI caller

NCM-to-NCM call. Uses Numba CUDA MLIR ABI marshalling (status + return pointer), and propagates lower-frame error status.

Mixed call. Arguments / return are marshalled using the callee’s C ABI signature. No callee status channel exists to propagate Python-exception status from the callee.

C ABI caller

Mixed call. The call is marshalled using the callee’s Numba CUDA MLIR ABI. The Numba CUDA MLIR callee can still produce status, but the C ABI caller has no outward status channel and does not propagate lower-frame status.

C-to-C call. Conventional C-style argument / return passing with no status channel.

What arbitrary nesting means#

Each call site is lowered using the callee’s ABI, not by forcing one ABI for the whole call chain. This allows patterns like:

Numba CUDA MLIR ABI caller -> C ABI callee -> Numba CUDA MLIR ABI callee -> C ABI callee

to compile as expected.

In practice, this means mixed boundaries can appear at any depth in a call graph, including calls to functions declared with numba_cuda_mlir.cuda.declare_device() and calls to Numba-compiled device subroutines.

Behavioral caveats#

  • The C ABI has no status channel for Python exception propagation.

  • When a C ABI caller invokes a Numba CUDA MLIR ABI callee returning Optional[T], the optional is flattened to T at the C ABI boundary. A None result is represented as the default-initialized value of T.

  • Kernels must still use the Numba ABI entry model; compiling kernels with abi="c" is unsupported.

  • For foreign CUDA C++ functions, use abi="c" with numba_cuda_mlir.cuda.declare_device() and follow pointer-signature guidance in Calling foreign functions from Python kernels.