cub::ShuffleUp
Defined in cub/util_ptx.cuh
-
template<int LOGICAL_WARP_THREADS, typename T>
T cub::ShuffleUp(T input, int src_offset, int first_thread, unsigned int member_mask) Shuffle-up for any data type.
Each warp-lane obtains the value
input
contributed by warp-lane-. For thread lanes i < src_offset, the thread’s owninput
is returned to the thread.Available only for SM3.0 or newer
- Snippet
The code snippet below illustrates each thread obtaining a
double
value from the predecessor of its predecessor.#include <cub/cub.cuh> // or equivalently <cub/util_ptx.cuh> __global__ void ExampleKernel(...) { // Obtain one input item per thread double thread_data = ... // Obtain item from two ranks below double peer_data = ShuffleUp<32>(thread_data, 2, 0, 0xffffffff);
Suppose the set of input
thread_data
across the first warp of threads is{1.0, 2.0, 3.0, 4.0, 5.0, ..., 32.0}
. The corresponding outputpeer_data
will be{1.0, 2.0, 1.0, 2.0, 3.0, ..., 30.0}
.
- Template Parameters
LOGICAL_WARP_THREADS – The number of threads per “logical” warp. Must be a power-of-two <= 32.
T – [inferred] The input/output element type
- Parameters
input – [in] The value to broadcast
src_offset – [in] The relative down-offset of the peer to read from
first_thread – [in] Index of first lane in logical warp (typically 0)
member_mask – [in] 32-bit mask of participating warp lanes