cub::WarpLoadAlgorithm
Defined in cub/warp/warp_load.cuh
-
enum cub::WarpLoadAlgorithm
cub::WarpLoadAlgorithm
enumerates alternative algorithms forcub::WarpLoad
to read a linear segment of data from memory into a CUDA warp.Values:
-
enumerator WARP_LOAD_DIRECT
Overview
A blocked arrangement of data is read directly from memory.
Performance Considerations
The utilization of memory transactions (coalescing) decreases as the access stride between threads increases (i.e., the number items per thread).
-
enumerator WARP_LOAD_STRIPED
Overview
A striped arrangement of data is read directly from memory.
Performance Considerations
The utilization of memory transactions (coalescing) doesn’t depend on the number of items per thread.
-
enumerator WARP_LOAD_VECTORIZE
Overview
A blocked arrangement of data is read from memory using CUDA’s built-in vectorized loads as a coalescing optimization. For example,
ld.global.v4.s32
instructions will be generated whenT = int
andITEMS_PER_THREAD % 4 == 0
.Performance Considerations
The utilization of memory transactions (coalescing) remains high until the the access stride between threads (i.e., the number items per thread) exceeds the maximum vector load width (typically 4 items or 64B, whichever is lower).
The following conditions will prevent vectorization and loading will fall back to cub::WARP_LOAD_DIRECT:
ITEMS_PER_THREAD
is oddThe
InputIteratorT
is not a simple pointer typeThe block input offset is not quadword-aligned
The data type
T
is not a built-in primitive or CUDA vector type (e.g.,short
,int2
,double
,float2
, etc.)
-
enumerator WARP_LOAD_TRANSPOSE
Overview
A striped arrangement of data is read efficiently from memory and then locally transposed into a blocked arrangement.
Performance Considerations
The utilization of memory transactions (coalescing) remains high regardless of items loaded per thread.
The local reordering incurs slightly longer latencies and throughput than the direct
cub::WARP_LOAD_DIRECT
andcub::WARP_LOAD_VECTORIZE
alternatives.
-
enumerator WARP_LOAD_DIRECT