tilus.Script.copy_async

tilus.Script.copy_async

Script.copy_async(src, dst, offsets, dims=None, evict=None, check_bounds=True)[source]

Copy from global to shared tensor asynchronously.

This instruction issues an asynchronous copy of a tile from a global tensor to a shared tensor. The src parameter specifies the global tensor to copy from, while the dst parameter specifies the shared tensor to copy to.

The offsets parameter specifies the starting offsets for each dimension of the global tensor where the tile will be copied from. The length of this sequence must match the rank of the global tensor.

The dims parameter specifies which dimensions of the global tensor are being sliced. If not provided, it is assumed that all dimensions are being sliced in the same order as the shared tensor. The length of this sequence must match the number of dimensions of the shared tensor being copied to.

The evict parameter can be used to specify the eviction policy. When we use this instruction, the data in the global memory will be cached. We can use the evict parameter to specify the eviction policy for the cached data of this instruction.

It’s valid to specify the loading elements out of bounds of the global tensor, in which case, we will perform bound checking and fill the out-of-bounds elements with zero in the shared tensor. The bound checking might introduce some overhead, especially when the user make sure that the accessed global elements are always in bounds but our compiler cannot infer it. In this case, we can set check_bounds to False to skip the bound checking. It’s the user’s responsibility to ensure that the accessed global elements are always in bounds when check_bounds is set to False.

Parameters:
  • src (GlobalTensor) – The global tensor to copy from.

  • dst (SharedTensor) – The shared tensor to copy to.

  • offsets (Sequence[Expr | int]) – The offsets for each dimension of the global tensor where the tile will be copied from. The length of this sequence must match the number of dimensions of the global tensor.

  • dims (Sequence[int], optional) – The dimensions of the global tensor that are being sliced when the rank of shared tensor is less than the rank of the global tensor. If not provided, it is assumed that all dimensions are being sliced in the same order as the shared tensor. The length of this sequence must match the number of dimensions of the shared tensor being copied to.

  • evict (str, optional) –

    The eviction policy for the cached data of this instruction. If not provided, the default eviction policy evict_normal is used, which is to evict the cached data when the shared memory is full. The eviction policy can be one of

    The candidates are:

    • ’evict_normal’: Evict the cached data when the shared memory is full.

    • ’evict_first’: Evict the cached data of this instruction first when an eviction is needed. This policy is suitable for streaming data where the data is only needed once and will not be reused.

  • check_bounds (bool, optional) – Whether to check the bounds of the accessed global elements. When set to True, the accessed global elements will be checked to ensure they are within bounds. If any accessed global element is out of bounds, it will be filled with zero in the shared tensor. When set to False, the bound checking will be skipped, and the user must ensure that the accessed global elements are always in bounds. The default value is True.

Return type:

None