Compilation API#

Numba CUDA MLIR provides an entry point for compiling a Python function without invoking any of the driver API. This can be useful for:

  • Generating PTX that is to be inlined into other PTX code (e.g. from outside the Numba CUDA MLIR / Python ecosystem).

  • Generating PTX or LTO-IR to link with objects from non-Python translation units.

  • Generating code when there is no device present.

  • Generating code prior to a fork without initializing CUDA.

Note

It is the user’s responsibility to manage any ABI issues arising from the use of compilation to PTX / LTO-IR. Passing the abi="c" keyword argument can provide a solution to most issues that may arise - see Using the C ABI.

The environment variable NUMBA_CUDA_DEFAULT_PTX_CC can be set to control the default compute capability targeted by compile - see Environment Variables. If code for the compute capability of the current device is required, the compile_for_current_device function can be used:

Numba CUDA MLIR also provides two functions that may be used in legacy code that specifically compile to PTX only: