cub::ShuffleIndex

Defined in /home/runner/work/cccl/cccl/cub/cub/util_ptx.cuh

template<int LOGICAL_WARP_THREADS, typename T>
T cub::ShuffleIndex(T input, int src_lane, unsigned int member_mask)

Shuffle-broadcast for any data type.

Each warp-lane obtains the value input contributed by warp-lane. For src_lane < 0 or src_lane >= WARP_THREADS, then the thread’s own input is returned to the thread. ../_images/shfl_broadcast_logo.png../_images/shfl_broadcast_logo.png../_images/shfl_broadcast_logo.png../_images/shfl_broadcast_logo.png

  • Available only for SM3.0 or newer

Snippet

The code snippet below illustrates each thread obtaining a double value from warp-lane0.

#include <cub/cub.cuh>   // or equivalently <cub/util_ptx.cuh>

__global__ void ExampleKernel(...)
{
    // Obtain one input item per thread
    double thread_data = ...

    // Obtain item from thread 0
    double peer_data = ShuffleIndex<32>(thread_data, 0, 0xffffffff);

Suppose the set of input thread_data across the first warp of threads is {1.0, 2.0, 3.0, 4.0, 5.0, ..., 32.0}. The corresponding output peer_data will be {1.0, 1.0, 1.0, 1.0, 1.0, ..., 1.0}.

Template Parameters
  • LOGICAL_WARP_THREADS – The number of threads per “logical” warp. Must be a power-of-two <= 32.

  • T[inferred] The input/output element type

Parameters
  • input[in] The value to broadcast

  • src_lane[in] Which warp lane is to do the broadcasting

  • member_mask[in] 32-bit mask of participating warp lanes