warp.tile_scatter_add#

warp.tile_scatter_add(
a: Tile[Any, tuple[int, ...]],
i: int32,
value: Any,
has_value: bool,
atomic: bool,
) None#
  • Kernel

  • Differentiable

Scatter-add a per-thread value into a shared-memory tile.

Cooperative operation – all threads in the block must call this function. Each thread whose has_value is True adds value at index i.

A synchronization barrier is included so the updated values are visible to all threads after the call returns.

Parameters:
  • a – A shared-memory tile to scatter-add into.

  • i – Index of the element to add to.

  • value – The value to add (must match the tile’s dtype).

  • has_value – Whether this thread should perform the add.

  • atomic – If True (default), use atomic add for safe concurrent writes. Set to False when indices are guaranteed unique across threads (e.g., lane-parallel writes) for better performance.

Example

@wp.kernel
def histogram(data: wp.array[float], out: wp.array[float]):

    bins = wp.tile_zeros(dtype=float, shape=4, storage="shared")
    i = wp.tid()
    # Bin values in [0, 8) into 4 bins of width 2
    b = int(data[i] / 2.0)
    wp.tile_scatter_add(bins, b, 1.0, True)
    wp.tile_store(out, bins, offset=0)

data = wp.array([0.5, 1.0, 2.5, 3.0, 4.5, 5.0, 6.5, 7.0], dtype=float)
output = wp.zeros(4, dtype=float)
wp.launch_tiled(histogram, dim=[1], inputs=[data, output], block_dim=8)

print(output.numpy())
[2. 2. 2. 2.]
warp.tile_scatter_add(
a: Tile[Any, tuple[int, ...]],
i: int32,
j: int32,
value: Any,
has_value: bool,
atomic: bool,
) None
  • Kernel

  • Differentiable

warp.tile_scatter_add(
a: Tile[Any, tuple[int, ...]],
i: int32,
j: int32,
k: int32,
value: Any,
has_value: bool,
atomic: bool,
) None
  • Kernel

  • Differentiable

warp.tile_scatter_add(
a: Tile[Any, tuple[int, ...]],
i: int32,
j: int32,
k: int32,
l: int32,
value: Any,
has_value: bool,
atomic: bool,
) None
  • Kernel

  • Differentiable