Numba CUDA MLIR#
Numba CUDA MLIR is an evolution of Numba-CUDA that improves upon its technical foundation and performance to provide the future evolution of CUDA Python JIT compilation.
It is used for writing SIMT kernels in Python, for providing Python bindings for accelerated device libraries, and as a compiler for user-defined functions in accelerated libraries like RAPIDS.
To install Numba CUDA MLIR, see: Installation.
To get started writing CUDA kernels in Python, see Writing CUDA Kernels.
Browse the Examples to see a variety of use cases of Numba CUDA MLIR.
Contents#
- User guide
- Installation
- Writing CUDA Kernels
- Memory management
- Global Variables and Captured Values
- Supported Python features in CUDA Python
- CUDA Fast Math
- Supported Atomic Operations
- Cooperative Groups
- Random Number Generation (Deprecated)
- Device management
- The Device List
- Device UUIDs
- Examples
- Debugging Numba CUDA Programs with Visual Studio Code and CUDA GDB
- GPU Reduction
- CUDA Ufuncs and Generalized Ufuncs
- IPC Support (Deprecated)
- Calling foreign functions from Python kernels
- Compiling Python functions for use with other languages
- CUDA device call conventions
- Profiling
- Reference documentation