Extension API#
Numba-CUDA-MLIR can be extended to support new types, functions, and methods so that they are usable from JIT-compiled device code. The Extension API has two tiers:
The High-level API enables support for new Python callables, methods, and attributes by writing pure Python implementation functions that are themselves JIT-compiled. This API should be preferred wherever possible.
The Low-level API exposes type inference and code generation machinery directly. It should be used for extensions that cannot be expressed with the high-level API - for example, when implementing a new type, when MLIR or PTX needs to be emitted directly, or when implementing implicit conversions between types.
The High-level API is closely modelled on Numba’s High-level Extension API. Its
decorators, such as overload() and
overload_method(), behave in a similar way
to their Numba counterparts.
The Low-level Typing API is also similar to Numba’s, but the Lowering API
differs: instead of emitting LLVM IR through llvmlite, lowering functions
emit MLIR through the bindings accessed via numba_cuda_mlir._mlir.
An understanding of how type inference works in Numba-CUDA-MLIR is crucial to effective use of the extension APIs. For a reference description of type inference, see Numba’s NBEP 5: Type Inference document.