cub::ShuffleDown

Defined in cub/util_ptx.cuh

template<int LOGICAL_WARP_THREADS, typename T> T cub::ShuffleDown(T input, int src_offset, int last_thread, unsigned int member_mask)

Shuffle-down for any data type.

Each warp-lane obtains the value input contributed by warp-lane₊. For thread lanes i >= WARP_THREADS, the thread’s own input is returned to the thread. ../_images/shfl_down_logo.png

Available only for SM3.0 or newer

Snippet

The code snippet below illustrates each thread obtaining a double value from the successor of its successor.

#include <cub/cub.cuh>   // or equivalently <cub/util_ptx.cuh>

__global__ void ExampleKernel(...)
{
    // Obtain one input item per thread
    double thread_data = ...

    // Obtain item from two ranks below
    double peer_data = ShuffleDown<32>(thread_data, 2, 31, 0xffffffff);

Suppose the set of input thread_data across the first warp of threads is {1.0, 2.0, 3.0, 4.0, 5.0, ..., 32.0}. The corresponding output peer_data will be {3.0, 4.0, 5.0, 6.0, 7.0, ..., 32.0}.

Template Parameters

LOGICAL_WARP_THREADS – The number of threads per “logical” warp. Must be a power-of-two <= 32.
T – [inferred] The input/output element type

Parameters

input – [in] The value to broadcast
src_offset – [in] The relative up-offset of the peer to read from
last_thread – [in] Index of last thread in logical warp (typically 31 for a 32-thread warp)
member_mask – [in] 32-bit mask of participating warp lanes