CCCL Python Libraries#
Overview#
The CUDA Core Compute Libraries (CCCL) for Python are a collection of modules with the shared goal of providing high-quality, high-performance, and easy-to-use abstractions for CUDA Python developers.
cuda.compute — Composable device-level primitives for building custom parallel algorithms, without writing CUDA kernels directly.
cuda.coop — Cooperative block- and warp-level algorithms for writing highly efficient CUDA kernels with Numba CUDA.
These libraries expose the generic, highly-optimized algorithms from the CCCL C++ libraries, which have been tuned to provide optimal performance across GPU architectures.
Who is this for?#
Library authors building parallel algorithms that need portable performance across GPU architectures—without dropping to CUDA C++.
Application developers using PyTorch, CuPy, or other GPU-accelerated frameworks who need custom algorithms beyond what those libraries provide.