warp.tile_atomic_add_indexed#

warp.tile_atomic_add_indexed( a: Array[Any], indices: Tile[int32, tuple[int]], t: Tile[Any, tuple[int, ...]], offset: tuple[int, ...], axis: int32, ) → Tile[Any, tuple[int, ...]]#

Kernel

Differentiable

Atomically add a tile to a global memory array, with storage along a specified axis mapped according to a 1D tile of indices.

Parameters:

a – The destination array in global memory
indices – A 1D tile of integer indices mapping to elements in a.
t – The source tile to extract data from, must have the same data type and number of dimensions as the destination array, and along axis, it must have the same number of elements as the indices tile.
offset – Offset in the destination array (optional)
axis – Axis of a that indices refer to.

Example

This example shows how to compute a blocked, row-wise reduction.

TILE_M = wp.constant(2)
TILE_N = wp.constant(2)

@wp.kernel
def tile_atomic_add_indexed(x: wp.array2d(dtype=float), y: wp.array2d(dtype=float)):
    i, j = wp.tid()

    t = wp.tile_load(x, shape=(TILE_M, TILE_N), offset=(i*TILE_M, j*TILE_N), storage="register")

    zeros = wp.tile_zeros(TILE_M, dtype=int, storage="shared")

    wp.tile_atomic_add_indexed(y, indices=zeros, t=t, offset=(i, j*TILE_N), axis=0)

M = TILE_M * 2
N = TILE_N * 2

arr = np.arange(M * N, dtype=float).reshape(M, N)

x = wp.array(arr, dtype=float, requires_grad=True, device=device)
y = wp.zeros((2, N), dtype=float, requires_grad=True, device=device)

wp.launch_tiled(tile_atomic_add_indexed, dim=[2,2], inputs=[x], outputs=[y], block_dim=32, device=device)

print(x.numpy())
print(y.numpy())

[[ 0.  1.  2.  3.]
    [ 4.  5.  6.  7.]
    [ 8.  9. 10. 11.]
    [12. 13. 14. 15.]]

[[ 4.  6.  8. 10.]
    [20. 22. 24. 26.]]