cuda.core 0.6.0 Release Notes#
Highlights#
Added the
cuda.core.systemmodule for NVML-based system and device queries.Several
StridedMemoryViewimprovements, including bfloat16 dlpack support and numpy array interoperability.Improved support for Python object protocols across core API classes.
Performance improvements through Cythonization and reduced Python overhead.
Breaking Changes#
Building
cuda.corefrom source now requirescuda-bindings>= 12.9.0, due to Cython-level dependencies on the NVVM bindings (cynvvm). Pre-built wheels are unaffected. The previous minimum was 12.8.0.
New features#
Added the
cuda.core.systemmodule for NVML-based system and device queries, including device attributes, clocks, temperatures, fans, events, and PCI information.StridedMemoryViewimprovements:Added
from_array_interfaceconstructor for creating views from numpy arrays.Improved structured dtype array support.
Added bfloat16 dlpack support when the optional
ml_dtypespackage is installed.
Added public access to default CUDA streams via module-level constants
LEGACY_DEFAULT_STREAMandPER_THREAD_DEFAULT_STREAM, replacing the previous workaround of usingStream.from_handle(0).Added
Kernel.from_handle()for wrapping an existingCUfunctionhandle into aKernelobject, enabling interoperability with foreign CUDA handles.Added
__eq__,__hash__,__weakref__, and__repr__support for core API classes includingBuffer,LaunchConfig,Kernel,ObjectCode,Stream, andEvent.Added NVVM
extra_sourcesanduse_libdeviceoptions toProgramOptionsfor multi-module NVVM compilation and automatic libdevice loading.Added CUDA version compatibility check at import time to detect mismatches between
cuda.coreand the installedcuda-bindingsversion.
Fixes and enhancements#
Eliminated spurious CUDA driver errors during interpreter shutdown by ensuring resources are destroyed in the correct order.
Fixed a bug preventing weak references to core API objects.
Fixed zero-sized allocations in legacy memory resources, which previously failed on certain platforms.
Improved performance by Cythonizing
ProgramandObjectCodeinternals.Reduced
StridedMemoryViewconstruction overhead.__hash__and__eq__on core API classes no longer require a CUDA context.Device attribute queries now gracefully handle unsupported attributes on older CUDA drivers, returning sensible defaults instead of raising errors.
Added a warning when
ManagedMemoryResourceis created on platforms without concurrent managed access support.Reduced wheel and installed package sizes by excluding Cython source files and build artifacts from distribution packages.
Slightly improved typing support.