`cuda.core` 0.X.Y Release Notes#

Released on TBD

Highlights#

Fix for LaunchConfig grid parameter unit conversion when thread block clusters are used.

CUDA 11 support dropped: CUDA 11 support is no longer tested and it may or may not work with cuda.bindings and CTK 11.x. Users are encouraged to migrate to CUDA 12.x or 13.x.
LaunchConfig grid parameter interpretation: When LaunchConfig.cluster is specified, the LaunchConfig.grid parameter now correctly represents the number of clusters instead of blocks. Previously, the grid parameter was incorrectly interpreted as blocks, causing a mismatch with the expected C++ behavior. This change ensures that LaunchConfig(grid=4, cluster=2, block=32) correctly produces 4 clusters × 2 blocks/cluster = 8 total blocks, matching the C++ equivalent cudax::make_hierarchy(cudax::grid_dims(4), cudax::cluster_dims(2), cudax::block_dims(32)).
When Buffer is closed, Buffer.handle is now set to None. It was previously set to 0 by accident.

Added Device.arch property that returns the compute capability as a string (e.g., ‘75’ for CC 7.5), providing a convenient alternative to manually concatenating the compute capability tuple.
CUDA 13.x testing support through new test-cu13 dependency group.

None.

Improved DeviceMemoryResource allocation performance when there are no active allocations by setting a higher release threshold (addresses issue #771).
Improved StridedMemoryView creation time performance by optimizing shape and strides tuple creation using Python/C API (addresses issue #449).
Fix LaunchConfig grid unit conversion when cluster is set (addresses issue #867).
Fixed a bug in GraphBuilder.add_child where dependencies extracted from capturing stream were passed inconsistently with num_dependencies parameter (addresses issue #843).
Make Buffer creation more performant.