warp.tile_scatter_masked#
- warp.tile_scatter_masked( ) None#
Kernel
Differentiable
Write a value into a shared-memory tile from the calling thread.
All threads in the block must call this function cooperatively. Each thread whose
has_valueisTruewritesvalueat the specified index. A synchronization barrier is included so the written values are visible to all threads after the call returns.Each index should be written by at most one thread per call. If multiple threads write to the same index, the result is undefined (data race in the forward pass, incorrect gradients in the backward pass).
Example
@wp.kernel def write_kernel(out: wp.array[int]): tile_idx, thread_idx = wp.tid() # Allocate a shared-memory tile t = wp.tile_zeros(shape=64, dtype=int, storage="shared") # Each thread writes its own slot wp.tile_scatter_masked(t, thread_idx, thread_idx + 1, True) wp.tile_store(out, t)
- Parameters:
a – The tile to write into (will use shared memory).
i – Index of the element to write.
value – The value to write (must match the tile’s dtype).
has_value – Whether this thread should perform the write.
- warp.tile_scatter_masked( ) None
Kernel
Differentiable
- warp.tile_scatter_masked( ) None
Kernel
Differentiable