warp.tile_from_thread#

warp.tile_from_thread(
shape: tuple[int, ...],
value: Any,
thread_idx: int32,
storage: str,
) Tile[Any, tuple[int, ...]]#
  • Kernel

Allocate a tile filled with a value from a specific thread.

This function broadcasts a value from one thread to all threads in the block, then creates a tile filled with that broadcast value. This is useful for efficiently sharing a computed result (e.g., from an atomic operation) with all threads in a block using minimal shared memory (only 1 element).

Parameters:
  • shape – Shape of the output tile

  • value – Per-thread value (only the value from thread_idx is used)

  • thread_idx – Index of the thread whose value should fill the tile

  • storage – The storage location for the tile: "register" for registers (default) or "shared" for shared memory.

Returns:

A tile filled with the value from the specified thread.

Example

import warp as wp

TILE_SIZE = 8

@wp.kernel
def compute(output: wp.array(dtype=int)):
    i, j = wp.tid()

    # Compute offset on the last thread
    offset = 0
    if j == wp.block_dim() - 1:
        offset = i * wp.block_dim()

    # Broadcast the last thread's offset to all threads (uses only 1 element of shared memory)
    offset_tile = wp.tile_from_thread(shape=TILE_SIZE, value=offset, thread_idx=wp.block_dim() - 1)

    # Combine with other tiles using tile operations
    indices = wp.tile_arange(0, TILE_SIZE, dtype=int)
    result = offset_tile + indices

    wp.tile_store(output, result, offset=(i * TILE_SIZE,))

output = wp.zeros(16, dtype=int)
wp.launch_tiled(compute, dim=[2], inputs=[output], block_dim=TILE_SIZE)

print(output.numpy())
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
warp.tile_from_thread(
shape: int32,
value: Any,
thread_idx: int32,
storage: str,
) Tile[Any, tuple[int, ...]]
  • Kernel

Allocate a tile filled with a value from a specific thread.