`cuda-bindings` 12.9.6 Release notes#

Released on Mar 11, 2026

Highlights#

cuda.bindings.nvml has graduated from experimental (cuda.bindings._nvml) to a fully supported public module with extensive handwritten Pythonic API coverage spanning ~170 functions across system queries, device discovery, memory, power, clocks, utilization, thermals, NVLink, and device configuration. (PR #1524, PR #1548)
Add nvFatbin bindings. (PR #1467)
Performance improvement: cuda.bindings now uses a faster enum implementation, rather than the standard library’s enum.IntEnum. This leads to much faster import times, and slightly faster attribute access times. (PR #1581)
Multiple performance improvements cumulatively reducing Python-to-C call overhead through faster void * conversion, faster result returning, optimized enum-to-vector conversion, and stack-allocated small arrays.

Fixed an issue where the CU_POINTER_ATTRIBUTE_DEVICE_ORDINAL attribute was retrieved as an unsigned int, rather than a signed int. (PR #1336)
Fixed a use-after-free in _HelperInputVoidPtr properties when backed by Python buffer objects. (PR #1629)

Faster void * conversion using stack-allocated buffers instead of heap allocation. (PR #1616)
Faster returning of results from driver, runtime, and NVRTC bindings. (PR #1647, PR #1656)
Faster conversion of enum sequences to vectors by eliminating temporary Python objects. (PR #1667)
Stack-allocated small numeric arrays in driver bindings, reducing heap allocation overhead. (PR #1545)
NVML bindings now use cuda_pathfinder for library discovery, consistent with other CUDA libraries. (PR #1661)
CUDA_HOME is no longer required at metadata resolution time (e.g. pip install --dry-run, uv lock); it is only needed at actual build time. (PR #1652)

Updating from older versions (v12.6.2.post1 and below) via pip install -U cuda-python might not work. Please do a clean re-installation by uninstalling pip uninstall -y cuda-python followed by installing pip install cuda-python.