cuda-bindings 13.2.0 Release notes#

Released on Mar 10, 2026

Highlights#

  • Support for new APIs introduced in CUDA 13.2, including new driver functions (cuKernelGetParamCount, cuMemcpyWithAttributesAsync, cuStreamBeginCaptureToCig, cuLaunchHostFunc_v2, cuGraphNodeGetParams, coredump callback registration, and more) and their runtime counterparts.

  • cuda.bindings.nvml has graduated from experimental (cuda.bindings._nvml) to a fully supported public module with extensive handwritten Pythonic API coverage spanning ~170 functions across system queries, device discovery, memory, power, clocks, utilization, thermals, NVLink, and device configuration. (PR #1524, PR #1548)

  • Add nvFatbin bindings. (PR #1467)

  • Performance improvement: cuda.bindings now uses a faster enum implementation, rather than the standard library’s enum.IntEnum. This leads to much faster import times, and slightly faster attribute access times. (PR #1581)

  • Multiple performance improvements cumulatively reducing Python-to-C call overhead through faster void * conversion, faster result returning, optimized enum-to-vector conversion, and stack-allocated small arrays.

  • Added CUDA version compatibility check that warns when the installed driver does not support the CUDA major version that cuda-bindings was built for. Can be disabled with CUDA_PYTHON_DISABLE_VERSION_CHECK=1. (PR #1412)

Bugfixes#

  • Fixed an issue where the CU_POINTER_ATTRIBUTE_DEVICE_ORDINAL attribute was retrieved as an unsigned int, rather than a signed int. (PR #1336)

  • Fixed ABI incompatibility bugs in cuFILE bindings introduced in v13.1.0. (PR #1468)

  • Fixed a use-after-free in _HelperInputVoidPtr properties when backed by Python buffer objects. (PR #1629)

Miscellaneous#

  • Faster void * conversion using stack-allocated buffers instead of heap allocation. (PR #1616)

  • Faster returning of results from driver, runtime, and NVRTC bindings. (PR #1647, PR #1656)

  • Faster conversion of enum sequences to vectors by eliminating temporary Python objects. (PR #1667)

  • Stack-allocated small numeric arrays in driver bindings, reducing heap allocation overhead. (PR #1545)

  • Wheel and installed package sizes significantly reduced (e.g., on a typical Linux x86_64 build, wheel from ~16.6 MB to ~5.7 MB and installed from ~152 MB to ~23 MB) by excluding Cython source files, generated C++ files, and template files from distribution packages.

  • NVML bindings now use cuda_pathfinder for library discovery, consistent with other CUDA libraries. (PR #1661)

  • Added get_c_compiler() function to report the C compiler used to build cuda.bindings. (PR #1591)

  • cuda-bindings now builds cleanly with clang. (PR #1658)

  • CUDA_HOME is no longer required at metadata resolution time (e.g. pip install --dry-run, uv lock); it is only needed at actual build time. (PR #1652)

Known issues#

  • Updating from older versions (v12.6.2.post1 and below) via pip install -U cuda-python might not work. Please do a clean re-installation by uninstalling pip uninstall -y cuda-python followed by installing pip install cuda-python.