.. currentmodule:: cuda.core.experimental

``cuda.core`` 0.2.0 Release Notes
=================================

Released on March 17, 2025

Highlights
----------

- Add :class:`~ProgramOptions` to facilitate the passing of runtime compile options to :obj:`~Program`.
- Add pythonic access to :class:`Device` and :class:`~_module.Kernel` attributes.

Breaking Changes
----------------

- The ``stream`` attribute is removed from :class:`~LaunchConfig`. Instead, the :class:`Stream` object should now be directly passed to :func:`~launch` as an argument.
- The signature for :func:`~launch` is changed by swapping positional arguments, the new signature is now ``(stream, config, kernel, *kernel_args)``
- Change ``__cuda_stream__`` from attribute to method.
- The :meth:`Program.compile` method no longer accepts the ``options`` argument. Instead, you can optionally pass an instance of :class:`ProgramOptions` to the constructor of :class:`Program`.
- :meth:`Device.properties` now provides attribute getters instead of a dictionary interface.
- The ``.handle`` attribute of various ``cuda.core`` objects now returns the underlying Python object instead of a (type-erased) Python integer.

New features
------------

- Expose :class:`ObjectCode` as a public API, which allows loading cubins from memory or disk. For loading other kinds of code types, please continue using :class:`Program`.
- A C++ helper function ``get_cuda_native_handle()`` is provided in the new ``include/utility.cuh`` header to retrive the underlying CUDA C objects (ex: ``CUstream``) from a Python object returned by the ``.handle`` attribute (ex: :attr:`Stream.handle`).
- For objects such as :class:`Program` and :class:`Linker` that could dispatch to different backends, a new ``.backend`` attribute is provided to query this information.
- Support CUDA :class:`Event` timing. (#481, #498, #508)
- An :class:`Event` may now be created without recording it to a :class:`~_stream.Stream` using the :meth:`Device.create_event` method.
- :class:`Program` now supports the additional ``PTX`` code type. (#317)
- :meth:`Linker.link` exceptions now include the original error log. (#423)
- In a systematic sweep through the cuda.core implementations, many exceptions messages were made more consistent and informative. (#458)

New examples
------------
- ``jit_lto_fractal.py`` — Demonstrates just-in-time link-time optimization for fractal generation. (:class:`Device`, :class:`LaunchConfig`, :class:`Linker`, :class:`LinkerOptions`, :class:`Program`, :class:`ProgramOptions`) (#475)
- ``simple_multi_gpu_example.py`` — Example of using multiple GPUs. (:class:`Device`, :class:`Program`, :class:`LaunchConfig`) (#304)
- ``show_device_properties.py`` — Displays detailed device properties. (:class:`Device`) (#474)

Minor fixes and enhancements
----------------------------
- A dangling pointer problem in ``_linker.py`` was fixed. (#516)
- Add ``@functools.lru_cache`` decorator for :func:`get_binding_version`. (#512)
- Selected ``.decode()`` were changed to ``.decode("utf-8", errors="backslashreplace")`` to ensure that decoding error messages does not abort the process. (#510)
- The performance of :meth:`Device.compute_capability` was improved. (#459)
- The :class:`Program` constructor now issues a warning when falling back to :func:`cuLink`. (#315)
- To avoid deprecation warnings, the cuda.bindings imports in the cuda.core implementations were cleaned up. (#404)

Test fixes
----------
- Clean up device initialization in some tests. (#507)