CUDA Python

CUDA Python is the home for accessing NVIDIA’s CUDA platform from Python. It consists of multiple components:

  • cuda.core: Pythonic access to CUDA runtime and other core functionalities

  • cuda.bindings: Low-level Python bindings to CUDA C APIs

  • cuda.cccl.cooperative: A Python module providing CCCL’s reusable block-wide and warp-wide device primitives for use within Numba CUDA kernels

  • cuda.cccl.parallel: A Python module for easy access to CCCL’s highly efficient and customizable parallel algorithms, like sort, scan, reduce, transform, etc, that are callable on the host

  • numba.cuda: Numba’s target for CUDA GPU programming by directly compiling a restricted subset of Python code into CUDA kernels and device functions following the CUDA execution model.

  • nvmath-python: Pythonic access to NVIDIA CPU & GPU Math Libraries, with both host and device (through nvmath.device) APIs. It also provides low-level Python bindings to host C APIs (through nvmath.bindings).

CUDA Python is currently undergoing an overhaul to improve existing and bring up new components. All of the previously available functionalities from the cuda-python package will continue to be available, please refer to the cuda.bindings documentation for installation guide and further detail.