cuda.core.TensorMapDescriptor#

class cuda.core.TensorMapDescriptor#

Describes a TMA (Tensor Memory Accelerator) tensor map for Hopper+ GPUs.

A TensorMapDescriptor wraps the opaque 128-byte CUtensorMap struct used by the hardware TMA unit for efficient bulk data movement between global and shared memory.

Public tiled descriptors are created via cuda.core.StridedMemoryView.as_tensor_map(). Specialized _from_* helpers remain private while this API surface settles, and descriptors can be passed directly to launch() as a kernel argument.

Methods

__init__(*args, **kwargs)#
replace_address(self, tensor)#

Replace the global memory address in this tensor map descriptor.

This is useful when the tensor data has been reallocated but the shape, strides, and other parameters remain the same.

Parameters:

tensor (object) – Any object supporting DLPack or __cuda_array_interface__, or a StridedMemoryView. Must refer to device-accessible memory with a 16-byte-aligned pointer.

Attributes

device#

Return the Device associated with this descriptor.