cuda.core
0.X.Y Release Notes¶
Released on TBD
Highlights¶
Fix for
LaunchConfig
grid parameter unit conversion when thread block clusters are used.
Breaking Changes¶
LaunchConfig grid parameter interpretation: When
LaunchConfig.cluster
is specified, theLaunchConfig.grid
parameter now correctly represents the number of clusters instead of blocks. Previously, the grid parameter was incorrectly interpreted as blocks, causing a mismatch with the expected C++ behavior. This change ensures thatLaunchConfig(grid=4, cluster=2, block=32)
correctly produces 4 clusters × 2 blocks/cluster = 8 total blocks, matching the C++ equivalentcudax::make_hierarchy(cudax::grid_dims(4), cudax::cluster_dims(2), cudax::block_dims(32))
.
New features¶
Added
Device.arch
property that returns the compute capability as a string (e.g., ‘75’ for CC 7.5), providing a convenient alternative to manually concatenating the compute capability tuple.
New examples¶
None.
Fixes and enhancements¶
Improved
DeviceMemoryResource
allocation performance when there are no active allocations by setting a higher release threshold (addresses issue #771).Fix
LaunchConfig
grid unit conversion when cluster is set (addresses issue #867).