cuda.core 0.X.Y Release Notes#
Released on TBD
Highlights#
Fix for
LaunchConfiggrid parameter unit conversion when thread block clusters are used.
Breaking Changes#
CUDA 11 support dropped: CUDA 11 support is no longer tested and it may or may not work with cuda.bindings and CTK 11.x. Users are encouraged to migrate to CUDA 12.x or 13.x.
LaunchConfig grid parameter interpretation: When
LaunchConfig.clusteris specified, theLaunchConfig.gridparameter now correctly represents the number of clusters instead of blocks. Previously, the grid parameter was incorrectly interpreted as blocks, causing a mismatch with the expected C++ behavior. This change ensures thatLaunchConfig(grid=4, cluster=2, block=32)correctly produces 4 clusters × 2 blocks/cluster = 8 total blocks, matching the C++ equivalentcudax::make_hierarchy(cudax::grid_dims(4), cudax::cluster_dims(2), cudax::block_dims(32)).When
Bufferis closed,Buffer.handleis now set toNone. It was previously set to0by accident.
New features#
Added
Device.archproperty that returns the compute capability as a string (e.g., ‘75’ for CC 7.5), providing a convenient alternative to manually concatenating the compute capability tuple.CUDA 13.x testing support through new
test-cu13dependency group.Stream-ordered memory allocation can now be shared on Linux via
DeviceMemoryResource.Added NVVM IR support to
Program. NVVM IR is now understood withcode_type="nvvm".
New examples#
None.
Fixes and enhancements#
Improved
DeviceMemoryResourceallocation performance when there are no active allocations by setting a higher release threshold (addresses issue #771).Improved
StridedMemoryViewcreation time performance by optimizing shape and strides tuple creation using Python/C API (addresses issue #449).Fix
LaunchConfiggrid unit conversion when cluster is set (addresses issue #867).Fixed a bug in
GraphBuilder.add_childwhere dependencies extracted from capturing stream were passed inconsistently with num_dependencies parameter (addresses issue #843).Make
Buffercreation more performant.Enabled
MemoryResourcesubclasses to acceptDeviceobjects, in addition to previously supported device ordinals.Fixed a bug in
Streamand other classes where object cleanup would error during interpreter shutdown.StridedMemoryViewof an underlying array using the DLPack protocol will no longer leak memory.