.. SPDX-License-Identifier: Apache-2.0 .. currentmodule:: cuda.core.experimental ``cuda.core`` 0.3.0 Release Notes ================================= Released on MM DD, 2025 Highlights ---------- - Starting this release ``cuda.core`` is licensed under Apache 2.0. The biggest implication of this change is that we are open to external contribution now! Please kindly follow the :ref:`Contributor Guide ` for detailed instructions. Breaking Changes ---------------- - The :class:`Buffer` object's ``__init__()`` method is removed, see below. - The :class:`Buffer` object's :meth:`~Buffer.close` method and destructor now always defer to the underlying memory resource implementation to decide the behavior if a stream is not explicitly passed. Previously, in this case it always uses the default stream, which could interfere with the memory resource's assumptions. New features ------------ - :class:`~_module.Kernel` adds :attr:`~_module.Kernel.num_arguments` and :attr:`~_module.Kernel.arguments_info` for introspection of kernel arguments. (#612) - Add pythonic access to kernel occupancy calculation functions via :attr:`Kernel.occupancy`. (#648) - Support launching cooperative kernels by setting :attr:`LaunchConfig.cooperative_launch` to `True`. - A name can be assigned to :class:`ObjectCode` instances generated by both :class:`Program` and :class:`Linker` through their respective options. - Expose :class:`Buffer`, :class:`DeviceMemoryResource`, :class:`LegacyPinnedMemoryResource`, and :class:`MemoryResource` to the top namespace. - Before this release, the internal :class:`Buffer` class had an ``__init__()`` constructor. To align with the design of cuda.core objects, this constructor is removed starting this release. Users who still need the old behavior should use the :meth:`~Buffer.from_handle` alternative constructor. - Add a typing annotation for the :attr:`~_stream.IsStreamT.__cuda_stream__` protocol. New examples ------------ - Add a PyTorch-based example. - Split the :class:`StridedMemoryView` example into two (CPU/GPU). Fixes and enhancements ---------------------- - ``cuda.core`` now raises more clear and actionable error messages whenever possible. - :class:`ObjectCode` can be pickled now. - Look-up of the :attr:`Event.device` and :attr:`Event.context` (the device and CUDA context where an event was created from) is now possible. - :class:`Event`-based timing is made more robust (also with better error messages). - The :func:`launch` function's handling of fp16 scalars was incorrect and is fixed. - :attr:`ProgramOptions.ptxas_options` can now accept more than one argument. - The :class:`Device` constructor is made faster. - The CFFI-based example no longer leaves the intermediate files on disk after it finishes.