warp.tile\_from\_thread ======================= .. function:: warp._src.lang.tile_from_thread(shape: tuple[int, ...], value: Any, thread_idx: int32, storage: str) -> Tile[Any,tuple[int, ...]] .. hlist:: :columns: 8 * Kernel Allocate a tile filled with a value from a specific thread. This function broadcasts a value from one thread to all threads in the block, then creates a tile filled with that broadcast value. This is useful for efficiently sharing a computed result (e.g., from an atomic operation) with all threads in a block using minimal shared memory (only 1 element). :param shape: Shape of the output tile :param value: Per-thread value (only the value from ``thread_idx`` is used) :param thread_idx: Index of the thread whose value should fill the tile :param storage: The storage location for the tile: ``"register"`` for registers (default) or ``"shared"`` for shared memory. :returns: A tile filled with the value from the specified thread. .. rubric:: Example .. code-block:: python import warp as wp TILE_SIZE = 8 @wp.kernel def compute(output: wp.array(dtype=int)): i, j = wp.tid() # Compute offset on the last thread offset = 0 if j == wp.block_dim() - 1: offset = i * wp.block_dim() # Broadcast the last thread's offset to all threads (uses only 1 element of shared memory) offset_tile = wp.tile_from_thread(shape=TILE_SIZE, value=offset, thread_idx=wp.block_dim() - 1) # Combine with other tiles using tile operations indices = wp.tile_arange(0, TILE_SIZE, dtype=int) result = offset_tile + indices wp.tile_store(output, result, offset=(i * TILE_SIZE,)) output = wp.zeros(16, dtype=int) wp.launch_tiled(compute, dim=[2], inputs=[output], block_dim=TILE_SIZE) print(output.numpy()) .. code-block:: text [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] .. function:: warp._src.lang.tile_from_thread(shape: int32, value: Any, thread_idx: int32, storage: str) -> Tile[Any,tuple[int, ...]] :noindex: .. hlist:: :columns: 8 * Kernel Allocate a tile filled with a value from a specific thread.