.. SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
.. SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE

.. module:: cuda.bindings

``cuda-bindings`` 13.2.0 Release notes
======================================

Highlights
----------

* Support for new APIs introduced in CUDA 13.2, including new driver functions
  (``cuKernelGetParamCount``, ``cuMemcpyWithAttributesAsync``,
  ``cuStreamBeginCaptureToCig``, ``cuLaunchHostFunc_v2``,
  ``cuGraphNodeGetParams``, coredump callback registration, and more) and their
  runtime counterparts.
* ``cuda.bindings.nvml`` has graduated from experimental
  (``cuda.bindings._nvml``) to a fully supported public module with coverage
  spanning 378 functions.  `PR #1524
  <https://github.com/NVIDIA/cuda-python/pull/1524>`_, `PR #1548
  <https://github.com/NVIDIA/cuda-python/pull/1548>`_)
* Add ``nvFatbin`` bindings.
  (`PR #1467 <https://github.com/NVIDIA/cuda-python/pull/1467>`_)
* Performance improvement: ``cuda.bindings`` now uses a faster ``enum``
  implementation, rather than the standard library's ``enum.IntEnum``.
  This leads to much faster import times, and slightly faster attribute access
  times.
  (`PR #1581 <https://github.com/NVIDIA/cuda-python/pull/1581>`_)
* Multiple performance improvements cumulatively reducing Python-to-C call
  overhead through faster ``void *`` conversion, faster result returning,
  optimized enum-to-vector conversion, and stack-allocated small arrays.
* Added CUDA version compatibility check that warns when the installed driver
  does not support the CUDA major version that ``cuda-bindings`` was built for.
  Can be disabled with ``CUDA_PYTHON_DISABLE_VERSION_CHECK=1``.
  (`PR #1412 <https://github.com/NVIDIA/cuda-python/pull/1412>`_)

Bugfixes
--------

* Fixed an issue where the ``CU_POINTER_ATTRIBUTE_DEVICE_ORDINAL`` attribute was
  retrieved as an unsigned int, rather than a signed int.
  (`PR #1336 <https://github.com/NVIDIA/cuda-python/pull/1336>`_)
* Fixed ABI incompatibility bugs in cuFILE bindings introduced in v13.1.0.
  (`PR #1468 <https://github.com/NVIDIA/cuda-python/pull/1468>`_)
* Fixed a use-after-free in ``_HelperInputVoidPtr`` properties when backed by
  Python buffer objects.
  (`PR #1629 <https://github.com/NVIDIA/cuda-python/pull/1629>`_)

Miscellaneous
-------------

* Faster ``void *`` conversion using stack-allocated buffers instead of heap
  allocation.
  (`PR #1616 <https://github.com/NVIDIA/cuda-python/pull/1616>`_)
* Faster returning of results from driver, runtime, and NVRTC bindings.
  (`PR #1647 <https://github.com/NVIDIA/cuda-python/pull/1647>`_,
  `PR #1656 <https://github.com/NVIDIA/cuda-python/pull/1656>`_)
* Faster conversion of enum sequences to vectors by eliminating temporary
  Python objects.
  (`PR #1667 <https://github.com/NVIDIA/cuda-python/pull/1667>`_)
* Stack-allocated small numeric arrays in driver bindings, reducing heap
  allocation overhead.
  (`PR #1545 <https://github.com/NVIDIA/cuda-python/pull/1545>`_)
* Wheel and installed package sizes significantly reduced (e.g., on a typical Linux x86_64
  build, wheel from ~16.6 MB to ~5.7 MB and installed from ~152 MB to ~23 MB) by excluding
  Cython source files, generated C++ files, and template files from distribution packages.
* NVML bindings now use ``cuda_pathfinder`` for library discovery, consistent
  with other CUDA libraries.
  (`PR #1661 <https://github.com/NVIDIA/cuda-python/pull/1661>`_)
* Added ``get_c_compiler()`` function to report the C compiler used to build
  ``cuda.bindings``.
  (`PR #1591 <https://github.com/NVIDIA/cuda-python/pull/1591>`_)
* ``cuda-bindings`` now builds cleanly with ``clang``.
  (`PR #1658 <https://github.com/NVIDIA/cuda-python/pull/1658>`_)
* ``CUDA_HOME`` is no longer required at metadata resolution time (e.g.
  ``pip install --dry-run``, ``uv lock``); it is only needed at actual build time.
  (`PR #1652 <https://github.com/NVIDIA/cuda-python/pull/1652>`_)

Known issues
------------

* Updating from older versions (v12.6.2.post1 and below) via ``pip install -U cuda-python`` might not work. Please do a clean re-installation by uninstalling ``pip uninstall -y cuda-python`` followed by installing ``pip install cuda-python``.
* ``nvml.system_get_process_name`` on WSL can return incorrect values.  To work around this, set the locale to "C" before calling ``nvml.device_get_compute_running_processes_v3`` (which sets the process names) and before calling ``nvml.system_get_process_name``. ``cuda_core`` does this automatically, but users of the raw NVML API will need to do this manually.