cuda.core 0.2.0 Release Notes

Released on March 17, 2025

Highlights

Breaking Changes

  • The stream attribute is removed from LaunchConfig. Instead, the Stream object should now be directly passed to launch() as an argument.

  • The signature for launch() is changed by swapping positional arguments, the new signature is now (stream, config, kernel, *kernel_args)

  • Change __cuda_stream__ from attribute to method.

  • The Program.compile() method no longer accepts the options argument. Instead, you can optionally pass an instance of ProgramOptions to the constructor of Program.

  • Device.properties() now provides attribute getters instead of a dictionary interface.

  • The .handle attribute of various cuda.core objects now returns the underlying Python object instead of a (type-erased) Python integer.

New features

  • Expose ObjectCode as a public API, which allows loading cubins from memory or disk. For loading other kinds of code types, please continue using Program.

  • A C++ helper function get_cuda_native_handle() is provided in the new include/utility.cuh header to retrive the underlying CUDA C objects (ex: CUstream) from a Python object returned by the .handle attribute (ex: Stream.handle).

  • For objects such as Program and Linker that could dispatch to different backends, a new .backend attribute is provided to query this information.

  • Support CUDA Event timing. (#481, #498, #508)

  • An Event may now be created without recording it to a Stream using the Device.create_event() method.

  • Program now supports the additional PTX code type. (#317)

  • Linker.link() exceptions now include the original error log. (#423)

  • In a systematic sweep through the cuda.core implementations, many exceptions messages were made more consistent and informative. (#458)

New examples

Minor fixes and enhancements

  • A dangling pointer problem in _linker.py was fixed. (#516)

  • Add @functools.lru_cache decorator for get_binding_version(). (#512)

  • Selected .decode() were changed to .decode("utf-8", errors="backslashreplace") to ensure that decoding error messages does not abort the process. (#510)

  • The performance of Device.compute_capability() was improved. (#459)

  • The Program constructor now issues a warning when falling back to cuLink(). (#315)

  • To avoid deprecation warnings, the cuda.bindings imports in the cuda.core implementations were cleaned up. (#404)

Test fixes

  • Clean up device initialization in some tests. (#507)