warp.tile\_load
===============

.. function:: warp._src.lang.tile_load(a: Array[Any], shape: tuple[int, ...], offset: tuple[int, ...], storage: str, bounds_check: bool, aligned: bool) -> Tile[Any, tuple[int, ...]]

   .. hlist::
      :columns: 8

      * Kernel
      * Differentiable

   Load a tile from a global memory array.
   
   This method will cooperatively load a tile from global memory using all threads in the block.
   
   :param a: The source array in global memory
   :param shape: Shape of the tile to load, must have the same number of dimensions as ``a``
   :param offset: Offset in the source array to begin reading from (optional)
   :param storage: The storage location for the tile: ``"register"`` for registers
                   (default) or ``"shared"`` for shared memory.
   :param bounds_check: Needed for unaligned tiles, but can disable for memory-aligned tiles for faster load times
   :param aligned: If True, skip runtime alignment checks for vectorized loads (shared memory,
                   2D+ tiles only). Has no effect for 1D tiles or register storage. Use when you
                   guarantee that: (1) the base address at the tile offset is 16-byte aligned,
                   (2) the array is contiguous (dense row-major strides), (3) all outer-dimension
                   strides are multiples of 16 bytes, and (4) the tile fits entirely within array
                   bounds. Address-alignment violations trap unconditionally (even in release
                   builds). Bounds and contiguity violations trigger debug-only asserts; in
                   release builds they cause silent data corruption.
   
   :returns: A tile with shape as specified and data type the same as the source array.
   

.. function:: warp._src.lang.tile_load(a: Array[Any], shape: int32, offset: int32, storage: str, bounds_check: bool, aligned: bool) -> Tile[Any, tuple[int, ...]]
   :noindex:

   .. hlist::
      :columns: 8

      * Kernel
      * Differentiable

   Load a tile from a global memory array.