cuda.core 1.0.0 Release Notes#
Highlights#
TBD
New features#
Added the
cuda.core.checkpointmodule for CUDA process checkpointing, including string process state queries, lock/checkpoint/restore/unlock operations, and GPU UUID remapping support for restore. (#1343)
Fixes and enhancements#
StridedMemoryViewnow provides a fast path fortorch.Tensorobjects via PyTorch’s AOT Inductor (AOTI) stable C ABI. When atorch.Tensoris passed to anyfrom_*classmethod (from_dlpack,from_cuda_array_interface,from_array_interface, orfrom_any_interface), tensor metadata is read directly from the underlying C struct, bypassing the DLPack and CUDA Array Interface protocol overhead. This yields ~7-20x fasterStridedMemoryViewconstruction for PyTorch tensors (depending on whether stream ordering is required). Proper CUDA stream ordering is established between PyTorch’s current stream and the consumer stream, matching the DLPack synchronization contract. Requires PyTorch >= 2.3. (#749)