4. Instructions¶
Tilus provides a set of instructions for working with tensors maintained by the thread block. These instructions allow you to create, copy, and compute tensors in global, shared, and register memory.
4.1. Tensor Creation and Free¶
Hint
Please submit a feature request if your kernel requires additional instructions.
|
Create a global tensor view. |
|
Create a register tensor. |
|
Allocate a shared tensor. |
|
Allocate a global tensor. |
|
Free a shared tensor. |
4.2. Load and Store Instructions¶
|
Load a slice of global tensor into a register tensor. |
|
Store a register tensor into a slice of a global tensor. |
|
Load a shared tensor into a register tensor. |
|
Store a register tensor into a shared tensor. |
4.3. Asynchronous Copy Instructions¶
|
Copy from global to shared tensor asynchronously. |
Commit async copies into a group. |
|
Wait the completion of asynchronous copy groups. |
|
Wait for all copy_async instructions to complete. |
4.4. Linear Algebra Instructions¶
|
Dot product. |
4.5. Elementwise Arithmetic Instructions¶
|
Add two register tensors element-wise. |
|
Compute the exponential of each element. |
|
Compute the base-2 exponential of each element. |
|
Compute the absolute value of a register tensor. |
|
Compute the element-wise maximum. |
|
Round each element to the nearest integer. |
|
Select elements from two tensors based on a condition. |
4.6. Transform Instructions¶
|
Assign the value of src tensor to dst tensor. |
|
Cast a register tensor to a different data type. |
|
Squeeze a dimension of a register tensor with size 1. |
|
Unsqueeze a dimension of a register tensor. |
|
Repeat elements of a register tensor along its dimensions. |
|
Repeat elements of a register tensor along its dimensions. |
|
View register tensor with a different layout or data type. |
|
Transpose a 2-D register tensor. |
4.7. Reduction Instructions¶
4.8. Atomic and Semaphore Instructions¶
|
Lock semaphore with a specified value. |
|
Release semaphore with a specified value. |
4.9. Synchronization Instruction¶
|
Perform a synchronization. |
4.10. Miscellaneous Instructions¶
|
Compiler hint to assume a condition is true. |
|
Annotate the layout of a register tensor. |
|
Print a tensor with a message. |
|
Print a formatted string. |