CUDA device call conventions#

Numba-CUDA supports two ABIs for device functions:

The Numba ABI, used internally by Numba for most compiled device code.
The C ABI, intended for interoperability with CUDA C/C++ style calls.

Important

This is a major deviation from upstream Numba behavior: Numba-CUDA supports arbitrary nesting between these ABIs. A Numba-ABI function can call a C-ABI function, which can call a Numba-ABI function, and so on.

ABI overview#

Numba ABI#

The Numba ABI is described in Device function ABI (without the extern "C" modifier):

The function has a status return code.
The Python return value is passed via a pointer in the first argument.
Function names are mangled using Numba’s mangling rules.
Optional returns and exception status can be represented via the status channel.

C ABI#

The C ABI behavior for compiled Python device functions is described in Using the C ABI:

The function has a conventional C-style signature: <return_type>(<args...>).
There is no separate status return code channel.
Function names are predictable (by default the Python __name__), and can be set explicitly with abi_info={"abi_name": ...}.
The C ABI is supported for device functions (not kernels).

Caller/callee matrix#

The table below summarizes what happens at each call edge:

Caller and callee ABI combinations#
Caller / Callee	Numba ABI callee	C ABI callee
Numba ABI caller	Numba-to-Numba call. Uses Numba ABI marshalling (status + return pointer), and propagates lower-frame error status.	Mixed call. Arguments / return are marshalled using the callee’s C ABI signature. No callee status channel exists to propagate Python-exception status from the callee.
C ABI caller	Mixed call. The call is marshalled using the callee’s Numba ABI. The Numba callee can still produce status, but the C ABI caller has no outward status channel and does not propagate lower-frame status.	C-to-C call. Conventional C-style argument / return passing with no status channel.

What arbitrary nesting means#

Each call site is lowered using the callee’s ABI, not by forcing one ABI for the whole call chain. This allows patterns like:

Numba ABI caller -> C ABI callee -> Numba ABI callee -> C ABI callee

to compile as expected.

In practice, this means mixed boundaries can appear at any depth in a call graph, including calls to functions declared with numba.cuda.declare_device() and calls to Numba-compiled device subroutines.

Behavioral caveats#

The C ABI has no status channel for Python exception propagation.
When a C ABI caller invokes a Numba ABI callee returning Optional[T], the optional is flattened to T at the C ABI boundary. A None result is represented as the default-initialized value of T.
Kernels must still use the Numba ABI entry model; compiling kernels with abi="c" is unsupported.
For foreign CUDA C/C++ functions, use abi="c" with numba.cuda.declare_device() and follow pointer-signature guidance in Calling foreign functions from Python kernels.