cuda.core 0.1.1 Release Notes#
Released on Dec 20, 2024
Highlights#
Add
StridedMemoryViewandargs_viewable_as_strided_memory()that provide a concrete implementation of DLPack & CUDA Array Interface supports.Add
Linkerthat can link one or multipleObjectCodeinstances generated byProgram. Under the hood, it uses either the nvJitLink or driver (cuLink*) APIs depending on the CUDA version detected in the current environment.Support
pip install cuda-core. Please see the Installation Guide for further details.
New features#
Add a
cuda.core.experiemental.systemmodule for querying system- or process-wide information.Add
clusterto support thread block clusters on Hopper GPUs.
Enhancements#
The internal handle held by
ObjectCodeis now lazily initialized upon first touch.Support TCC devices with a default synchronous memory resource to avoid the use of memory pools.
Ensure
"ltoir"is a valid code type toObjectCode.Document the
__cuda_stream__protocol.Improve test coverage & documentation cross-references.
Enforce code formatting.
Bug fixes#
Eliminate potential class destruction issues.
Fix circular import during handling a foreign CUDA stream.
Limitations#
All APIs are currently experimental and subject to change without deprecation notice. Please kindly share your feedback with us so that we can make
cuda.corebetter!Using
cuda.corewith NVRTC or nvJitLink installed from PyPI viapip installis currently not supported. This will be fixed in a future release.Some
LinkerOptionsare only available when using a modern version of CUDA. When using CUDA <12, the backend is the cuLink API which supports only a subset of the options that nvjitlink does. Further, some options aren’t available on CUDA versions <12.6.To use
cuda.corewith Python 3.13, it currently requires buildingcuda-pythonfrom source prior topip install. This extra step will be fixed soon.