tilus.Script

class tilus.Script[source]

The Script class represents a tilus script, which defines a GPU kernel through a sequence of block-level instructions. See Tilus Script for an overview of the tilus script language.

__init__()[source]

Initializes the script. All subclass should call this __init__ method. In the __init__ method of the subclass, it can be used to perform compilation-time setup, such as defining hyper-parameters or pre-computing values that will be used in the kernel code.

__call__(*args, **kwargs)[source]

Defines the kernel code that will be executed on the GPU. This method should contain the logic of the kernel, including memory accesses, computations, and any other operations that need to be performed.

Attributes and Variables

attrs

Kernel attributes like number of blocks and warps.

blockIdx

Get the block index of the current thread block.

gridDim

Get the grid dimension of the kernel.

Language Constructs

assume(cond)

Compiler hint to assume a condition is true.

range(start[, end, step])

Create an iterator used in a for loop.

Instructions

abs(x, *[, out])

Compute the absolute value of a register tensor.

add(lhs, rhs[, out])

Add two register tensors element-wise.

annotate_layout(tensor, layout)

Annotate the layout of a register tensor.

assign(dst, src)

Assign the value of src tensor to dst tensor.

cast(x, dtype)

Cast a register tensor to a different data type.

copy_async(src, dst, offsets[, dims, evict, ...])

Copy from global to shared tensor asynchronously.

copy_async_commit_group()

Commit async copies into a group.

copy_async_wait_all()

Wait for all copy_async instructions to complete.

copy_async_wait_group(n)

Wait the completion of asynchronous copy groups.

dot(a, b[, c, acc_dtype, out])

Dot product.

exp(x, *[, out])

Compute the exponential of each element.

exp2(x, *[, out])

Compute the base-2 exponential of each element.

free_shared(tensor)

Free a shared tensor.

global_tensor(dtype, shape, *[, layout])

Allocate a global tensor.

global_view(ptr, *, dtype, shape[, strides, ...])

Create a global tensor view.

load_global(src, /, *, offsets[, shape, ...])

Load a slice of global tensor into a register tensor.

load_shared(src, *[, layout, out])

Load a shared tensor into a register tensor.

lock_semaphore(semaphore, value)

Lock semaphore with a specified value.

max(x, *, dim[, keepdim, out])

Compute the maximum value along a dimension.

maximum(lhs, rhs[, out])

Compute the element-wise maximum.

min(x, *, dim[, keepdim, out])

Compute the minimum value along a dimension.

print_tensor(msg, tensor[, fmt])

Print a tensor with a message.

printf(fstring, *args)

Print a formatted string.

register_tensor(*, dtype, shape[, layout, init])

Create a register tensor.

release_semaphore(semaphore, value)

Release semaphore with a specified value.

repeat(x, repeats, *[, out])

Repeat elements of a register tensor along its dimensions.

repeat_interleave(x, repeats, *[, out])

Repeat elements of a register tensor along its dimensions.

round(x, *[, out])

Round each element to the nearest integer.

shared_tensor(*, dtype[, shape, layout])

Allocate a shared tensor.

squeeze(x, *, dim[, out])

Squeeze a dimension of a register tensor with size 1.

store_global(dst, src, *, offsets[, dims])

Store a register tensor into a slice of a global tensor.

store_shared(dst, src, *[, offsets, dims])

Store a register tensor into a shared tensor.

sum(x, *, dim[, keepdim, out])

Sum the elements along a specified dimension.

sync()

Perform a synchronization.

transpose(x, *[, out])

Transpose a 2-D register tensor.

unsqueeze(x, *, dim[, out])

Unsqueeze a dimension of a register tensor.

view(x, *[, layout, dtype])

View register tensor with a different layout or data type.

where(condition, x, y, *[, out])

Select elements from two tensors based on a condition.

Script Attributes

class tilus.lang.Attributes[source]

blocks

The number of blocks.

warps

The number of warps.