warp.tile_from_thread#
- warp.tile_from_thread( ) Tile[Any, tuple[int, ...]]#
Kernel
Allocate a tile filled with a value from a specific thread.
This function broadcasts a value from one thread to all threads in the block, then creates a tile filled with that broadcast value. This is useful for efficiently sharing a computed result (e.g., from an atomic operation) with all threads in a block using minimal shared memory (only 1 element).
- Parameters:
shape – Shape of the output tile
value – Per-thread value (only the value from
thread_idxis used)thread_idx – Index of the thread whose value should fill the tile
storage – The storage location for the tile:
"register"for registers (default) or"shared"for shared memory.
- Returns:
A tile filled with the value from the specified thread.
Example
import warp as wp TILE_SIZE = 8 @wp.kernel def compute(output: wp.array(dtype=int)): i, j = wp.tid() # Compute offset on the last thread offset = 0 if j == wp.block_dim() - 1: offset = i * wp.block_dim() # Broadcast the last thread's offset to all threads (uses only 1 element of shared memory) offset_tile = wp.tile_from_thread(shape=TILE_SIZE, value=offset, thread_idx=wp.block_dim() - 1) # Combine with other tiles using tile operations indices = wp.tile_arange(0, TILE_SIZE, dtype=int) result = offset_tile + indices wp.tile_store(output, result, offset=(i * TILE_SIZE,)) output = wp.zeros(16, dtype=int) wp.launch_tiled(compute, dim=[2], inputs=[output], block_dim=TILE_SIZE) print(output.numpy())
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]