cub::LoadDirectBlockedVectorized#
- 
template<typename T, int ItemsPerThread>
 void cub::LoadDirectBlockedVectorized(
- int linear_tid,
- T *block_src_ptr,
- T (&dst_items)[ItemsPerThread],
- Load a linear segment of items into a blocked arrangement across the thread block. - Assumes a blocked arrangement of (block-threads * items-per-thread) items across the thread block, where threadi owns the ith range of items-per-thread contiguous items. For multi-dimensional thread blocks, a row-major thread ordering is assumed. - The input offset ( - block_ptr + block_offset) must be quad-item aligned- The following conditions will prevent vectorization and loading will fall back to cub::BLOCK_LOAD_DIRECT: - ItemsPerThreadis odd
- The data type - Tis not a built-in primitive or CUDA vector type (e.g.,- short,- int2,- double,- float2, etc.)
 - Template Parameters:
- T – [inferred] The data type to load. 
- ItemsPerThread – [inferred] The number of consecutive items partitioned onto each thread. 
 
- Parameters:
- linear_tid – [in] A suitable 1D thread-identifier for the calling thread (e.g., - (threadIdx.y * blockDim.x) + linear_tidfor 2D thread blocks)
- block_src_ptr – [in] The thread block’s base pointer for loading from 
- dst_items – [out] destination to load data into