cub::StoreDirectBlockedVectorized
Defined in cub/block/block_store.cuh
-
template<typename T, int ITEMS_PER_THREAD>
void cub::StoreDirectBlockedVectorized(int linear_tid, T *block_ptr, T (&items)[ITEMS_PER_THREAD]) Store a blocked arrangement of items across a thread block into a linear segment of items.
Assumes a blocked arrangement of (block-threads * items-per-thread) items across the thread block, where threadi owns the ith range of items-per-thread contiguous items. For multi-dimensional thread blocks, a row-major thread ordering is assumed.
The output offset (
block_ptr + block_offset
) must be quad-item aligned, which is the default starting offset returned bycudaMalloc()
The following conditions will prevent vectorization and storing will fall back to cub::BLOCK_STORE_DIRECT:
ITEMS_PER_THREAD
is oddThe data type
T
is not a built-in primitive or CUDA vector type (e.g.,short
,int2
,double
,float2
, etc.)
- Template Parameters
T – [inferred] The data type to store.
ITEMS_PER_THREAD – [inferred] The number of consecutive items partitioned onto each thread.
- Parameters
linear_tid – [in] A suitable 1D thread-identifier for the calling thread (e.g.,
(threadIdx.y * blockDim.x) + linear_tid
for 2D thread blocks)block_ptr – [in] Input pointer for storing from
items – [in] Data to store