.. currentmodule:: cuda.core.experimental ``cuda.core`` 0.2.0 Release Notes ================================= Released on March 17, 2025 Highlights ---------- - Add :class:`~ProgramOptions` to facilitate the passing of runtime compile options to :obj:`~Program`. - Add pythonic access to :class:`Device` and :class:`~_module.Kernel` attributes. Breaking Changes ---------------- - The ``stream`` attribute is removed from :class:`~LaunchConfig`. Instead, the :class:`Stream` object should now be directly passed to :func:`~launch` as an argument. - The signature for :func:`~launch` is changed by swapping positional arguments, the new signature is now ``(stream, config, kernel, *kernel_args)`` - Change ``__cuda_stream__`` from attribute to method. - The :meth:`Program.compile` method no longer accepts the ``options`` argument. Instead, you can optionally pass an instance of :class:`ProgramOptions` to the constructor of :class:`Program`. - :meth:`Device.properties` now provides attribute getters instead of a dictionary interface. - The ``.handle`` attribute of various ``cuda.core`` objects now returns the underlying Python object instead of a (type-erased) Python integer. New features ------------ - Expose :class:`ObjectCode` as a public API, which allows loading cubins from memory or disk. For loading other kinds of code types, please continue using :class:`Program`. - A C++ helper function ``get_cuda_native_handle()`` is provided in the new ``include/utility.cuh`` header to retrive the underlying CUDA C objects (ex: ``CUstream``) from a Python object returned by the ``.handle`` attribute (ex: :attr:`Stream.handle`). - For objects such as :class:`Program` and :class:`Linker` that could dispatch to different backends, a new ``.backend`` attribute is provided to query this information. - Support CUDA :class:`Event` timing. (#481, #498, #508) - An :class:`Event` may now be created without recording it to a :class:`~_stream.Stream` using the :meth:`Device.create_event` method. - :class:`Program` now supports the additional ``PTX`` code type. (#317) - :meth:`Linker.link` exceptions now include the original error log. (#423) - In a systematic sweep through the cuda.core implementations, many exceptions messages were made more consistent and informative. (#458) New examples ------------ - ``jit_lto_fractal.py`` — Demonstrates just-in-time link-time optimization for fractal generation. (:class:`Device`, :class:`LaunchConfig`, :class:`Linker`, :class:`LinkerOptions`, :class:`Program`, :class:`ProgramOptions`) (#475) - ``simple_multi_gpu_example.py`` — Example of using multiple GPUs. (:class:`Device`, :class:`Program`, :class:`LaunchConfig`) (#304) - ``show_device_properties.py`` — Displays detailed device properties. (:class:`Device`) (#474) Minor fixes and enhancements ---------------------------- - A dangling pointer problem in ``_linker.py`` was fixed. (#516) - Add ``@functools.lru_cache`` decorator for :func:`get_binding_version`. (#512) - Selected ``.decode()`` were changed to ``.decode("utf-8", errors="backslashreplace")`` to ensure that decoding error messages does not abort the process. (#510) - The performance of :meth:`Device.compute_capability` was improved. (#459) - The :class:`Program` constructor now issues a warning when falling back to :func:`cuLink`. (#315) - To avoid deprecation warnings, the cuda.bindings imports in the cuda.core implementations were cleaned up. (#404) Test fixes ---------- - Clean up device initialization in some tests. (#507)