cub::LoadDirectWarpStriped

Defined in cub/block/block_load.cuh

template<typename T, int ITEMS_PER_THREAD, typename RandomAccessIterator>
void cub::LoadDirectWarpStriped(int linear_tid, RandomAccessIterator block_src_it, T (&dst_items)[ITEMS_PER_THREAD], int block_items_end)

Load a linear segment of items into a warp-striped arrangement across the thread block, guarded by range

Assumes a warp-striped arrangement of elements across threads, where warpi owns the ith range of (warp-threads * items-per-thread) contiguous items, and each thread owns items (i), (i + warp-threads), …, (i + (warp-threads * (items-per-thread - 1))).

Usage Considerations

The number of threads in the thread block must be a multiple of the architecture’s warp size.

Template Parameters
  • Tinferred The data type to load.

  • ITEMS_PER_THREADinferred The number of consecutive items partitioned onto each thread.

  • RandomAccessIteratorinferred The random-access iterator type for input (may be a simple pointer type).

Parameters
  • linear_tid[in] A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks)

  • block_src_it[in] The thread block’s base iterator for loading from

  • dst_items[out] Destination to load data into

  • block_items_end[in] Number of valid items to load