cuda-bindings 13.2.0 Release notes#
Released on Mar 10, 2026
Highlights#
Support for new APIs introduced in CUDA 13.2, including new driver functions (
cuKernelGetParamCount,cuMemcpyWithAttributesAsync,cuStreamBeginCaptureToCig,cuLaunchHostFunc_v2,cuGraphNodeGetParams, coredump callback registration, and more) and their runtime counterparts.cuda.bindings.nvmlhas graduated from experimental (cuda.bindings._nvml) to a fully supported public module with extensive handwritten Pythonic API coverage spanning ~170 functions across system queries, device discovery, memory, power, clocks, utilization, thermals, NVLink, and device configuration. (PR #1524, PR #1548)Add
nvFatbinbindings. (PR #1467)Performance improvement:
cuda.bindingsnow uses a fasterenumimplementation, rather than the standard library’senum.IntEnum. This leads to much faster import times, and slightly faster attribute access times. (PR #1581)Multiple performance improvements cumulatively reducing Python-to-C call overhead through faster
void *conversion, faster result returning, optimized enum-to-vector conversion, and stack-allocated small arrays.Added CUDA version compatibility check that warns when the installed driver does not support the CUDA major version that
cuda-bindingswas built for. Can be disabled withCUDA_PYTHON_DISABLE_VERSION_CHECK=1. (PR #1412)
Bugfixes#
Fixed an issue where the
CU_POINTER_ATTRIBUTE_DEVICE_ORDINALattribute was retrieved as an unsigned int, rather than a signed int. (PR #1336)Fixed ABI incompatibility bugs in cuFILE bindings introduced in v13.1.0. (PR #1468)
Fixed a use-after-free in
_HelperInputVoidPtrproperties when backed by Python buffer objects. (PR #1629)
Miscellaneous#
Faster
void *conversion using stack-allocated buffers instead of heap allocation. (PR #1616)Faster returning of results from driver, runtime, and NVRTC bindings. (PR #1647, PR #1656)
Faster conversion of enum sequences to vectors by eliminating temporary Python objects. (PR #1667)
Stack-allocated small numeric arrays in driver bindings, reducing heap allocation overhead. (PR #1545)
Wheel and installed package sizes significantly reduced (e.g., on a typical Linux x86_64 build, wheel from ~16.6 MB to ~5.7 MB and installed from ~152 MB to ~23 MB) by excluding Cython source files, generated C++ files, and template files from distribution packages.
NVML bindings now use
cuda_pathfinderfor library discovery, consistent with other CUDA libraries. (PR #1661)Added
get_c_compiler()function to report the C compiler used to buildcuda.bindings. (PR #1591)cuda-bindingsnow builds cleanly withclang. (PR #1658)CUDA_HOMEis no longer required at metadata resolution time (e.g.pip install --dry-run,uv lock); it is only needed at actual build time. (PR #1652)
Known issues#
Updating from older versions (v12.6.2.post1 and below) via
pip install -U cuda-pythonmight not work. Please do a clean re-installation by uninstallingpip uninstall -y cuda-pythonfollowed by installingpip install cuda-python.