cuda.core 1.0.0 Release Notes#
Highlights#
TBD
New features#
TBD
Breaking changes#
Renamed
GraphDeftoGraphDefinitionfor consistency with the rest of the API, which spells words out (e.g.TensorMapDescriptor, notTensorMapDesc). (#1950)Renamed
cuda.core.graph.ConditiontoGraphConditionto follow theGraph*prefix convention used byGraphBuilder,GraphDefinition,GraphNode. (#1945)
Fixes and enhancements#
StridedMemoryViewnow provides a fast path fortorch.Tensorobjects via PyTorch’s AOT Inductor (AOTI) stable C ABI. When atorch.Tensoris passed to anyfrom_*classmethod (from_dlpack,from_cuda_array_interface,from_array_interface, orfrom_any_interface), tensor metadata is read directly from the underlying C struct, bypassing the DLPack and CUDA Array Interface protocol overhead. This yields ~7-20x fasterStridedMemoryViewconstruction for PyTorch tensors (depending on whether stream ordering is required). Proper CUDA stream ordering is established between PyTorch’s current stream and the consumer stream, matching the DLPack synchronization contract. Requires PyTorch >= 2.3. (#749)