StoreDirectWarpStriped#

Overloads#

StoreDirectWarpStriped(linear_tid, block_itr, T(&items)[ITEMS_PER_THREAD])#

template<typename T, int ITEMS_PER_THREAD, typename OutputIteratorT>
void cub::StoreDirectWarpStriped(
int linear_tid,
OutputIteratorT block_itr,
T (&items)[ITEMS_PER_THREAD],
)

Store a warp-striped arrangement of data across the thread block into a linear segment of items.

Assumes a warp-striped arrangement of elements across threads, where warp\ i owns the i\ th range of (warp-threads * items-per-thread) contiguous items, and each thread owns items (i), (i + warp-threads), …, (i + (warp-threads * (items-per-thread - 1))).

Usage Considerations#

The number of threads in the thread block must be a multiple of the architecture’s warp size.

Template Parameters:
  • T[inferred] The data type to store.

  • ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.

  • OutputIteratorT[inferred] The random-access iterator type for output (may be a simple pointer type).

Parameters:
  • linear_tid[in] A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks)

  • block_itr[in] The thread block’s base output iterator for storing to

  • items[out] Data to load

StoreDirectWarpStriped(linear_tid, block_itr, T(&items)[ITEMS_PER_THREAD], valid_items)#

template<typename T, int ITEMS_PER_THREAD, typename OutputIteratorT>
void cub::StoreDirectWarpStriped(
int linear_tid,
OutputIteratorT block_itr,
T (&items)[ITEMS_PER_THREAD],
int valid_items,
)

Store a warp-striped arrangement of data across the thread block into a linear segment of items, guarded by range

Assumes a warp-striped arrangement of elements across threads, where warp\ i owns the i\ th range of (warp-threads * items-per-thread) contiguous items, and each thread owns items (i), (i + warp-threads), …, (i + (warp-threads * (items-per-thread - 1))).

Usage Considerations#

The number of threads in the thread block must be a multiple of the architecture’s warp size.

Template Parameters:
  • T[inferred] The data type to store.

  • ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.

  • OutputIteratorT[inferred] The random-access iterator type for output (may be a simple pointer type).

Parameters:
  • linear_tid[in] A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks)

  • block_itr[in] The thread block’s base output iterator for storing to

  • items[in] Data to store

  • valid_items[in] Number of valid items to write