Compilation API#
Numba CUDA MLIR provides an entry point for compiling a Python function without invoking any of the driver API. This can be useful for:
Generating PTX that is to be inlined into other PTX code (e.g. from outside the Numba CUDA MLIR / Python ecosystem).
Generating PTX or LTO-IR to link with objects from non-Python translation units.
Generating code when there is no device present.
Generating code prior to a fork without initializing CUDA.
Note
It is the user’s responsibility to manage any ABI issues arising from
the use of compilation to PTX / LTO-IR. Passing the abi="c" keyword
argument can provide a solution to most issues that may arise - see
Using the C ABI.
The environment variable NUMBA_CUDA_DEFAULT_PTX_CC can be set to control
the default compute capability targeted by compile - see
Environment Variables. If code for the compute capability of the
current device is required, the compile_for_current_device function can
be used:
Numba CUDA MLIR also provides two functions that may be used in legacy code that specifically compile to PTX only: