1. Overview

Tilus is a programming model designed to simplify the development of high-performance applications on modern hardware. It provides a high-level abstraction for writing parallel programs that can run efficiently on GPUs, while also allowing for fine-grained control over hardware resources like shared memory and thread mapping. This guide provides an overview of the Tilus programming model and its key features.

1.1. Hello World

To write a kernel with Tilus Script, we can define a subclass of tilus.Script and implement the __call__ method.

import torch
import tilus

# define the kernel by subclassing `tilus.Script`
class MyKernel(tilus.Script):
    def __call__(self):
        self.attrs.blocks = 1   # one thread block
        self.attrs.warps = 1    # one warp per thread block

        self.printf("Hello, World!\n")

# instantiate the kernel
kernel = MyKernel()

# launch the kernel on GPU
kernel()
torch.cuda.synchronize()

Output:

Hello, World!

1.2. Dive Deeper

We also have detailed sections on different aspects of the Tilus programming model. To learn more about Tilus, you can explore the following sections: