cuda.core 0.X.Y Release Notes

Released on TBD

Highlights

  • Fix for LaunchConfig grid parameter unit conversion when thread block clusters are used.

Breaking Changes

  • LaunchConfig grid parameter interpretation: When LaunchConfig.cluster is specified, the LaunchConfig.grid parameter now correctly represents the number of clusters instead of blocks. Previously, the grid parameter was incorrectly interpreted as blocks, causing a mismatch with the expected C++ behavior. This change ensures that LaunchConfig(grid=4, cluster=2, block=32) correctly produces 4 clusters × 2 blocks/cluster = 8 total blocks, matching the C++ equivalent cudax::make_hierarchy(cudax::grid_dims(4), cudax::cluster_dims(2), cudax::block_dims(32)).

New features

  • Added Device.arch property that returns the compute capability as a string (e.g., ‘75’ for CC 7.5), providing a convenient alternative to manually concatenating the compute capability tuple.

New examples

None.

Fixes and enhancements

  • Improved DeviceMemoryResource allocation performance when there are no active allocations by setting a higher release threshold (addresses issue #771).

  • Fix LaunchConfig grid unit conversion when cluster is set (addresses issue #867).