cub::LoadDirectBlockedVectorized
Defined in cub/block/block_load.cuh
-
template<typename T, int ITEMS_PER_THREAD>
void cub::LoadDirectBlockedVectorized(int linear_tid, T *block_src_ptr, T (&dst_items)[ITEMS_PER_THREAD]) Load a linear segment of items into a blocked arrangement across the thread block.
Assumes a blocked arrangement of (block-threads * items-per-thread) items across the thread block, where threadi owns the ith range of items-per-thread contiguous items. For multi-dimensional thread blocks, a row-major thread ordering is assumed.
The input offset (
block_ptr + block_offset
) must be quad-item alignedThe following conditions will prevent vectorization and loading will fall back to cub::BLOCK_LOAD_DIRECT:
ITEMS_PER_THREAD
is oddThe data type
T
is not a built-in primitive or CUDA vector type (e.g.,short
,int2
,double
,float2
, etc.)
- Template Parameters
T – [inferred] The data type to load.
ITEMS_PER_THREAD – [inferred] The number of consecutive items partitioned onto each thread.
- Parameters
linear_tid – [in] A suitable 1D thread-identifier for the calling thread (e.g.,
(threadIdx.y * blockDim.x) + linear_tid
for 2D thread blocks)block_src_ptr – [in] The thread block’s base pointer for loading from
dst_items – [out] destination to load data into