Extension API#

Numba-CUDA-MLIR can be extended to support new types, functions, and methods so that they are usable from JIT-compiled device code. The Extension API has two tiers:

The High-level API enables support for new Python callables, methods, and attributes by writing pure Python implementation functions that are themselves JIT-compiled. This API should be preferred wherever possible.
The Low-level API exposes type inference and code generation machinery directly. It should be used for extensions that cannot be expressed with the high-level API - for example, when implementing a new type, when MLIR or PTX needs to be emitted directly, or when implementing implicit conversions between types.

The High-level API is closely modelled on Numba’s High-level Extension API. Its decorators, such as overload() and overload_method(), behave in a similar way to their Numba counterparts.

The Low-level Typing API is also similar to Numba’s, but the Lowering API differs: instead of emitting LLVM IR through llvmlite, lowering functions emit MLIR through the bindings accessed via numba_cuda_mlir._mlir.

An understanding of how type inference works in Numba-CUDA-MLIR is crucial to effective use of the extension APIs. For a reference description of type inference, see Numba’s NBEP 5: Type Inference document.