cuda.core v0.1.1 Release notes

Released on Dec 20, 2024

Hightlights

  • Add StridedMemoryView and @args_viewable_as_strided_memory that provide a concrete implementation of DLPack & CUDA Array Interface supports.

  • Add Linker that can link one or multiple ObjectCode instances generated by Program. Under the hood, it uses either the nvJitLink or driver (cuLink*) APIs depending on the CUDA version detected in the current environment.

  • Support pip install cuda-core. Please see the Installation Guide for further details.

New features

  • Add a cuda.core.experimental.system module for querying system- or process- wide information.

  • Add LaunchConfig.cluster to support thread block clusters on Hopper GPUs.

Enchancements

  • The internal handle held by ObjectCode is now lazily initialized upon first touch.

  • Support TCC devices with a default synchronous memory resource to avoid the use of memory pools.

  • Ensure "ltoir" is a valid code type to ObjectCode.

  • Document the __cuda_stream__ protocol.

  • Improve test coverage & documentation cross-references.

  • Enforce code formatting.

Bug fixes

  • Eliminate potential class destruction issues.

  • Fix circular import during handling a foreign CUDA stream.

Limitations

  • All APIs are currently experimental and subject to change without deprecation notice. Please kindly share your feedbacks with us so that we can make cuda.core better!

  • Using cuda.core with NVRTC or nvJitLink installed from PyPI via pip install is currently not supported. This will be fixed in a future release.

  • Some LinkerOptions are only available when using a modern version of CUDA. When using CUDA <12, the backend is the cuLink api which supports only a subset of the options that nvjitlink does. Further, some options aren’t available on CUDA versions <12.6.

  • To use cuda.core with Python 3.13, it currently requires building cuda-python from source prior to pip install. This extra step will be fixed soon.