warp.tile\_sum ============== .. function:: warp._src.lang.tile_sum(a: Tile[Any,tuple[int, ...]], axis: int32) -> Tile[Any,tuple[int, ...]] .. hlist:: :columns: 8 * Kernel * Differentiable Cooperatively compute the sum of the tile elements. Reduce across a tile axis using all threads in the block. :param a: The input tile. Must reside in shared memory. :param axis: The tile axis to compute the sum across. Must be a compile-time constant. :returns: A tile with the same shape as the input tile less the axis dimension and the same data type as the input tile. .. rubric:: Example .. code-block:: python @wp.kernel def compute(): t = wp.tile_ones(dtype=float, shape=(8, 8)) s = wp.tile_sum(t, axis=0) print(s) wp.launch_tiled(compute, dim=[1], inputs=[], block_dim=64) .. code-block:: text [8 8 8 8 8 8 8 8] = tile(shape=(8), storage=register) .. function:: warp._src.lang.tile_sum(a: Tile[Any,tuple[int, ...]]) -> Tile[Any,tuple[Literal[1]]] :noindex: .. hlist:: :columns: 8 * Kernel * Differentiable Cooperatively compute the sum of the tile elements. Reduce across all elements using all threads in the block. :param a: The tile to compute the sum of :returns: A single-element tile holding the sum. .. rubric:: Example .. code-block:: python @wp.kernel def compute(): t = wp.tile_ones(dtype=float, shape=(16, 16)) s = wp.tile_sum(t) print(s) wp.launch_tiled(compute, dim=[1], inputs=[], block_dim=64) .. code-block:: text [256] = tile(shape=(1), storage=register)