3.5. Global Tensor¶
A global tensor (i.e., GlobalTensor
) is a tensor stored in the global memory of the GPU. It has
the following attributes:
dtype: the data type of the tensor elements, which can be any scalar type.
shape: the shape of the tensor, which is a tuple of integers representing the size of each dimension. The dimension sizes can be constant or any grid-invariant expressions such as the kernel parameters.
layout: the layout of the tensor, which defines how the tensor elements are stored in the linear global memory.
3.5.1. Defining a Global Tensor¶
We usually use pointers as kernel parameters and use global_view()
to define a global tensor
with global pointer to the first element of the tensor.
Besides, we can also use global_tensor()
to allocate a global tensor shared by all thread blocks
in the kernel. The global tensor is managed by the runtime system and has a lifetime that spans the entire kernel
execution.
3.5.2. Loading and Storing Global Tensors¶
We can use the following instructions to load and store global tensors:
Load and Store
|
Load a slice of global tensor into a register tensor. |
|
Store a register tensor into a slice of a global tensor. |
Similar to shared tensors, we do not provide arithmetic instructions for global tensors. To perform computation on
global tensors, we must first load the data into register tensors using the load_global()
instruction, perform the computation on the register tensors, and then store the results back to global memory using
the store_global()
instruction.
3.5.3. Global Layout¶
Each global tensor has a layout that defines how the tensor elements are stored in the linear global memory. It can be arbitrary mapping from the multi-dimensional indices to the linear memory address.
The global tensor creation instructions (global_view()
and global_tensor()
)
have an optional parameter layout
that can be specified to define the layout of the tensor.
There are several cases for specifying the layout of global tensors:
Not provided: if neither
layout
norstrides
is specified, we assume the compact row-major layout is used.strides
provided: if thestrides
parameter is provided, it defines the strides of the tensor in each dimension. The strides are a tuple of integers representing the number of elements to skip in each dimension to get to the next element in that dimension.layout
provided: if thelayout
parameter is provided, it defines the layout of the tensor. The layout can be any custom mapping from the multi-dimensional indices to the linear memory address.