.. SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. .. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE .. module:: cuda.bindings ``cuda-bindings`` 13.2.0 Release notes ====================================== Highlights ---------- * Support for new APIs introduced in CUDA 13.2, including new driver functions (``cuKernelGetParamCount``, ``cuMemcpyWithAttributesAsync``, ``cuStreamBeginCaptureToCig``, ``cuLaunchHostFunc_v2``, ``cuGraphNodeGetParams``, coredump callback registration, and more) and their runtime counterparts. * ``cuda.bindings.nvml`` has graduated from experimental (``cuda.bindings._nvml``) to a fully supported public module with coverage spanning 378 functions. `PR #1524 `_, `PR #1548 `_) * Add ``nvFatbin`` bindings. (`PR #1467 `_) * Performance improvement: ``cuda.bindings`` now uses a faster ``enum`` implementation, rather than the standard library's ``enum.IntEnum``. This leads to much faster import times, and slightly faster attribute access times. (`PR #1581 `_) * Multiple performance improvements cumulatively reducing Python-to-C call overhead through faster ``void *`` conversion, faster result returning, optimized enum-to-vector conversion, and stack-allocated small arrays. * Added CUDA version compatibility check that warns when the installed driver does not support the CUDA major version that ``cuda-bindings`` was built for. Can be disabled with ``CUDA_PYTHON_DISABLE_VERSION_CHECK=1``. (`PR #1412 `_) Bugfixes -------- * Fixed an issue where the ``CU_POINTER_ATTRIBUTE_DEVICE_ORDINAL`` attribute was retrieved as an unsigned int, rather than a signed int. (`PR #1336 `_) * Fixed ABI incompatibility bugs in cuFILE bindings introduced in v13.1.0. (`PR #1468 `_) * Fixed a use-after-free in ``_HelperInputVoidPtr`` properties when backed by Python buffer objects. (`PR #1629 `_) Miscellaneous ------------- * Faster ``void *`` conversion using stack-allocated buffers instead of heap allocation. (`PR #1616 `_) * Faster returning of results from driver, runtime, and NVRTC bindings. (`PR #1647 `_, `PR #1656 `_) * Faster conversion of enum sequences to vectors by eliminating temporary Python objects. (`PR #1667 `_) * Stack-allocated small numeric arrays in driver bindings, reducing heap allocation overhead. (`PR #1545 `_) * Wheel and installed package sizes significantly reduced (e.g., on a typical Linux x86_64 build, wheel from ~16.6 MB to ~5.7 MB and installed from ~152 MB to ~23 MB) by excluding Cython source files, generated C++ files, and template files from distribution packages. * NVML bindings now use ``cuda_pathfinder`` for library discovery, consistent with other CUDA libraries. (`PR #1661 `_) * Added ``get_c_compiler()`` function to report the C compiler used to build ``cuda.bindings``. (`PR #1591 `_) * ``cuda-bindings`` now builds cleanly with ``clang``. (`PR #1658 `_) * ``CUDA_HOME`` is no longer required at metadata resolution time (e.g. ``pip install --dry-run``, ``uv lock``); it is only needed at actual build time. (`PR #1652 `_) Known issues ------------ * Updating from older versions (v12.6.2.post1 and below) via ``pip install -U cuda-python`` might not work. Please do a clean re-installation by uninstalling ``pip uninstall -y cuda-python`` followed by installing ``pip install cuda-python``. * ``nvml.system_get_process_name`` on WSL can return incorrect values. To work around this, set the locale to "C" before calling ``nvml.device_get_compute_running_processes_v3`` (which sets the process names) and before calling ``nvml.system_get_process_name``. ``cuda_core`` does this automatically, but users of the raw NVML API will need to do this manually.