4. Instructions

Tilus provides a set of instructions for working with tensors maintained by the thread block. These instructions allow you to create, copy, and compute tensors in global, shared, and register memory.

4.1. Tensor Creation and Free

Hint

Please submit a feature request if your kernel requires additional instructions.

global_view(ptr, *, dtype, shape[, strides, ...])

Create a global tensor view.

register_tensor(*, dtype, shape[, layout, init])

Create a register tensor.

shared_tensor(*, dtype[, shape, layout])

Allocate a shared tensor.

global_tensor(dtype, shape, *[, layout])

Allocate a global tensor.

free_shared(tensor)

Free a shared tensor.

4.2. Load and Store Instructions

load_global(src, /, *, offsets[, shape, ...])

Load a slice of global tensor into a register tensor.

store_global(dst, src, *, offsets[, dims])

Store a register tensor into a slice of a global tensor.

load_shared(src, *[, layout, out])

Load a shared tensor into a register tensor.

store_shared(dst, src, *[, offsets, dims])

Store a register tensor into a shared tensor.

4.3. Asynchronous Copy Instructions

copy_async(src, dst, offsets[, dims, evict, ...])

Copy from global to shared tensor asynchronously.

copy_async_commit_group()

Commit async copies into a group.

copy_async_wait_group(n)

Wait the completion of asynchronous copy groups.

copy_async_wait_all()

Wait for all copy_async instructions to complete.

4.4. Linear Algebra Instructions

dot(a, b[, c, acc_dtype, out])

Dot product.

4.5. Elementwise Arithmetic Instructions

add(lhs, rhs[, out])

Add two register tensors element-wise.

exp(x, *[, out])

Compute the exponential of each element.

exp2(x, *[, out])

Compute the base-2 exponential of each element.

abs(x, *[, out])

Compute the absolute value of a register tensor.

maximum(lhs, rhs[, out])

Compute the element-wise maximum.

round(x, *[, out])

Round each element to the nearest integer.

where(condition, x, y, *[, out])

Select elements from two tensors based on a condition.

4.6. Transform Instructions

assign(dst, src)

Assign the value of src tensor to dst tensor.

cast(x, dtype)

Cast a register tensor to a different data type.

squeeze(x, *, dim[, out])

Squeeze a dimension of a register tensor with size 1.

unsqueeze(x, *, dim[, out])

Unsqueeze a dimension of a register tensor.

repeat(x, repeats, *[, out])

Repeat elements of a register tensor along its dimensions.

repeat_interleave(x, repeats, *[, out])

Repeat elements of a register tensor along its dimensions.

view(x, *[, layout, dtype])

View register tensor with a different layout or data type.

transpose(x, *[, out])

Transpose a 2-D register tensor.

4.7. Reduction Instructions

max(x, *, dim[, keepdim, out])

Compute the maximum value along a dimension.

min(x, *, dim[, keepdim, out])

Compute the minimum value along a dimension.

sum(x, *, dim[, keepdim, out])

Sum the elements along a specified dimension.

4.8. Atomic and Semaphore Instructions

lock_semaphore(semaphore, value)

Lock semaphore with a specified value.

release_semaphore(semaphore, value)

Release semaphore with a specified value.

4.9. Synchronization Instruction

sync()

Perform a synchronization.

4.10. Miscellaneous Instructions

assume(cond)

Compiler hint to assume a condition is true.

annotate_layout(tensor, layout)

Annotate the layout of a register tensor.

print_tensor(msg, tensor[, fmt])

Print a tensor with a message.

printf(fstring, *args)

Print a formatted string.