Script.copy_async

Script.copy_async

Script.copy_async(src, dst, offsets, dims=None, evict=None, check_bounds=True)

Asynchronously copy a tile from global memory to shared memory.

Issues an cp.async transfer from a region of src (global) to dst (shared). Use copy_async_commit_group() and copy_async_wait_group() to synchronize.

Out-of-bounds accesses are zero-filled by default. Set check_bounds=False to skip bounds checking when you can guarantee all accesses are in-bounds.

Parameters:
  • src (GlobalTensor) – The global tensor to copy from.

  • dst (SharedTensor) – The shared tensor to copy to.

  • offsets (Sequence[Expr | int]) – Starting offsets for each dimension of the global tensor. Length must match the rank of the global tensor.

  • dims (Sequence[int], optional) – Which dimensions of the global tensor are being sliced. If not provided, defaults to all dimensions in order.

  • evict (str, optional) –

    Cache eviction policy. Candidates:

    • 'evict_normal' (default): normal eviction priority.

    • 'evict_first': evict this data first; suitable for streaming access patterns.

  • check_bounds (bool, optional) – If True (default), out-of-bounds accesses are zero-filled. If False, bounds checking is skipped (caller must guarantee in-bounds access).

Return type:

None

Notes

  • Thread group: Can be executed by any sized thread group.

  • Hardware: Requires compute capability 8.0+ (sm_80).

  • PTX: cp.async