cuda.core
v0.1.1 Release notes¶
Released on Dec 20, 2024
Hightlights¶
Add
StridedMemoryView
and@args_viewable_as_strided_memory
that provide a concrete implementation of DLPack & CUDA Array Interface supports.Add
Linker
that can link one or multipleObjectCode
instances generated byProgram
. Under the hood, it uses either the nvJitLink or driver (cuLink*
) APIs depending on the CUDA version detected in the current environment.Support
pip install cuda-core
. Please see the Installation Guide for further details.
New features¶
Add a
cuda.core.experimental.system
module for querying system- or process- wide information.Add
LaunchConfig.cluster
to support thread block clusters on Hopper GPUs.
Enchancements¶
The internal handle held by
ObjectCode
is now lazily initialized upon first touch.Support TCC devices with a default synchronous memory resource to avoid the use of memory pools.
Ensure
"ltoir"
is a valid code type toObjectCode
.Document the
__cuda_stream__
protocol.Improve test coverage & documentation cross-references.
Enforce code formatting.
Bug fixes¶
Eliminate potential class destruction issues.
Fix circular import during handling a foreign CUDA stream.
Limitations¶
All APIs are currently experimental and subject to change without deprecation notice. Please kindly share your feedbacks with us so that we can make
cuda.core
better!Using
cuda.core
with NVRTC or nvJitLink installed from PyPI viapip install
is currently not supported. This will be fixed in a future release.Some
LinkerOptions
are only available when using a modern version of CUDA. When using CUDA <12, the backend is the cuLink api which supports only a subset of the options that nvjitlink does. Further, some options aren’t available on CUDA versions <12.6.To use
cuda.core
with Python 3.13, it currently requires buildingcuda-python
from source prior topip install
. This extra step will be fixed soon.