cuda.core 0.3.0 Release Notes

Released on MM DD, 2025

Highlights

  • Starting this release cuda.core is licensed under Apache 2.0. The biggest implication of this change is that we are open to external contribution now! Please kindly follow the Contributor Guide for detailed instructions.

Breaking Changes

  • The Buffer object’s __init__() method is removed, see below.

  • The Buffer object’s close() method and destructor now always defer to the underlying memory resource implementation to decide the behavior if a stream is not explicitly passed. Previously, in this case it always uses the default stream, which could interfere with the memory resource’s assumptions.

New features

New examples

  • Add a PyTorch-based example.

  • Split the StridedMemoryView example into two (CPU/GPU).

Fixes and enhancements

  • cuda.core now raises more clear and actionable error messages whenever possible.

  • ObjectCode can be pickled now.

  • Look-up of the Event.device and Event.context (the device and CUDA context where an event was created from) is now possible.

  • Event-based timing is made more robust (also with better error messages).

  • The launch() function’s handling of fp16 scalars was incorrect and is fixed.

  • ProgramOptions.ptxas_options can now accept more than one argument.

  • The Device constructor is made faster.

  • The CFFI-based example no longer leaves the intermediate files on disk after it finishes.