cuda.core.TensorMapDescriptor#
- class cuda.core.TensorMapDescriptor#
Describes a TMA (Tensor Memory Accelerator) tensor map for Hopper+ GPUs.
A
TensorMapDescriptorwraps the opaque 128-byteCUtensorMapstruct used by the hardware TMA unit for efficient bulk data movement between global and shared memory.Public tiled descriptors are created via
cuda.core.StridedMemoryView.as_tensor_map(). Specialized_from_*helpers remain private while this API surface settles, and descriptors can be passed directly tolaunch()as a kernel argument.Methods
- __init__(*args, **kwargs)#
- replace_address(self, tensor)#
Replace the global memory address in this tensor map descriptor.
This is useful when the tensor data has been reallocated but the shape, strides, and other parameters remain the same.
- Parameters:
tensor (object) – Any object supporting DLPack or
__cuda_array_interface__, or aStridedMemoryView. Must refer to device-accessible memory with a 16-byte-aligned pointer.
Attributes