cuda.core 1.0.0 Release Notes#
Highlights#
TBD
New features#
TBD
Fixes and enhancements#
StridedMemoryViewnow provides a fast path fortorch.Tensorobjects via PyTorch’s AOT Inductor (AOTI) stable C ABI. When atorch.Tensoris passed to anyfrom_*classmethod (from_dlpack,from_cuda_array_interface,from_array_interface, orfrom_any_interface), tensor metadata is read directly from the underlying C struct, bypassing the DLPack and CUDA Array Interface protocol overhead. This yields ~7-20x fasterStridedMemoryViewconstruction for PyTorch tensors (depending on whether stream ordering is required). Proper CUDA stream ordering is established between PyTorch’s current stream and the consumer stream, matching the DLPack synchronization contract. Requires PyTorch >= 2.3. (#749)