tilus.Script¶
- class tilus.Script[source]¶
The
Scriptclass represents a tilus script, which defines a GPU kernel through a sequence of block-level instructions. See Tilus Script for an overview of the tilus script language.
Attributes and Variables¶
Get the block index of the current thread block. |
|
Get the grid dimension of the kernel. |
|
Get the number of threads in the current thread group. |
|
Get the beginning thread index of the current thread group. |
|
Get the ending thread index of the current thread group. |
Language Constructs¶
|
Compiler hint to assume a condition is true. |
|
Create an iterator used in a for loop. |
|
Create a thread group context with only one thread. |
|
Create a thread group context with a single warp (32 threads). |
|
Assert a compile-time condition. |
|
Create a thread group context. |
|
Create a thread group context with multiple warps. |
Instructions¶
|
Compute the element-wise absolute value. |
|
Element-wise addition with broadcasting. |
|
Test whether all elements are non-zero along the specified dimension(s). |
|
Annotate the layout of a register or shared tensor. |
|
Test whether any element is non-zero along the specified dimension(s). |
|
Assign the value of src tensor to dst tensor. |
|
Cast a register tensor to a different data type. |
|
Clip element values to the range [min, max]. |
|
Compute the element-wise cosine. |
|
Asynchronously copy a tile from global memory to shared memory. |
Commit async copies into a group. |
|
Wait for all copy_async instructions to complete. |
|
Wait the completion of asynchronous copy groups. |
|
|
Dot product. |
|
Compute the element-wise natural exponential (e^x). |
|
Compute the element-wise base-2 exponential (2^x). |
|
Fast integer division and modulo using precomputed magic multiplier. |
|
Flatten a register tensor into a 1-D tensor. |
|
Free a shared tensor. |
|
Allocate a global tensor. |
|
Create a global tensor view. |
|
Load a slice of global tensor into a register tensor. |
|
Load a shared tensor into a register tensor. |
|
Lock semaphore with a specified value. |
|
Compute the element-wise natural logarithm (ln x). |
|
Compute the maximum along the specified dimension(s). |
|
Element-wise maximum with broadcasting. |
|
Compute the minimum along the specified dimension(s). |
|
Print a tensor with a message. |
|
Print a formatted string. |
|
Generate a block of random float32 in U(0, 1) using Philox-4x32 PRNG. |
|
Generate a block of random int32 using Philox-4x32 PRNG. |
|
Generate four blocks of random int32 using Philox-4x32 PRNG. |
|
Generate a block of random float32 in N(0, 1) using Philox-4x32 PRNG. |
|
Create a register tensor. |
|
Release semaphore with a specified value. |
|
Repeat elements of a register tensor along its dimensions. |
|
Repeat elements of a register tensor along its dimensions. |
|
Reshape a shared tensor. |
|
Round each element to the nearest integer (round-to-nearest-even). |
|
Compute the element-wise reciprocal square root (1/sqrt(x)). |
|
Allocate a shared tensor. |
|
Compute the element-wise sine. |
|
Compute the element-wise square root. |
|
Compute the element-wise square (x^2). |
|
Squeeze a dimension of a register tensor with size 1. |
|
Store a register tensor into a slice of a global tensor. |
|
Store a register tensor into a shared tensor. |
|
Sum elements along the specified dimension(s). |
|
Perform a synchronization. |
|
Transpose a 2-D register tensor. |
|
Unsqueeze a dimension of a register tensor. |
|
View register tensor with a different layout or data type. |
|
Select elements from |
Instruction Groups¶
Memory barrier instructions for synchronizing async memory transactions. |
|
Fence instructions for memory ordering between proxies. |
|
Tensor Memory Accelerator (TMA) async copy instructions. |
|
Tensor Core Generation 05 (Blackwell) instructions. |
|
Cluster Launch Control instructions. |
|
Block cluster synchronization and shared memory access. |
|
Warp Group Matrix Multiply-Accumulate (Hopper) instructions. |
Script Attributes¶
Kernel launch configuration (blocks, warps, cluster). |