`cuda.core` 0.1.1 Release Notes#

Released on Dec 20, 2024

Highlights#

Add StridedMemoryView and args_viewable_as_strided_memory() that provide a concrete implementation of DLPack & CUDA Array Interface supports.
Add Linker that can link one or multiple ObjectCode instances generated by Program. Under the hood, it uses either the nvJitLink or driver (cuLink*) APIs depending on the CUDA version detected in the current environment.
Support pip install cuda-core. Please see the Installation Guide for further details.

Add a cuda.core.experiemental.system module for querying system- or process-wide information.
Add cluster to support thread block clusters on Hopper GPUs.

The internal handle held by ObjectCode is now lazily initialized upon first touch.
Support TCC devices with a default synchronous memory resource to avoid the use of memory pools.
Ensure "ltoir" is a valid code type to ObjectCode.
Document the __cuda_stream__ protocol.
Improve test coverage & documentation cross-references.
Enforce code formatting.

All APIs are currently experimental and subject to change without deprecation notice. Please kindly share your feedback with us so that we can make cuda.core better!
Using cuda.core with NVRTC or nvJitLink installed from PyPI via pip install is currently not supported. This will be fixed in a future release.
Some LinkerOptions are only available when using a modern version of CUDA. When using CUDA <12, the backend is the cuLink API which supports only a subset of the options that nvjitlink does. Further, some options aren’t available on CUDA versions <12.6.
To use cuda.core with Python 3.13, it currently requires building cuda-python from source prior to pip install. This extra step will be fixed soon.