Explore how Tilus distributes tensor elements across GPU threads
Type a layout expression, press Enter, and see how tensor elements are distributed across threads. Each cell shows T<thread_id> and the local index. Cells are color-coded by thread. Hover to highlight all elements owned by the same thread, or click a thread in the legend.
spatial(d0, d1, ...) — distribute elements across threadslocal(d0, d1, ...) — store elements in each thread's local registerscolumn_spatial(d0, d1, ...) — spatial in column-major ordercolumn_local(d0, d1, ...) — local in column-major orderlocal(3, 4).spatial(2, 3) — equivalent to product(local(3, 4), spatial(2, 3))product(A, B) or A * B — Kronecker-like product of two layoutsdivide(A, B) or A / B — divide layout A by Breduce(layout, [dim0, ...]) — reduce over dimensions (creates replicated threads)permute(layout, [d0, ...]) — permute dimensionsreshape(layout, [s0, ...]) — reshape to new shapelocal(4, 4) — single thread holds all 16 elementsspatial(4, 8).local(4, 4) — 32 threads with 16 elements eachreduce(spatial(3, 4), [0]) — replicated elements across threadslocal(2, 1).spatial(8, 4).local(1, 2) — MMA m16n8k8 output layout