cub::BlockRakingLayout#

template<typename T, int BlockThreads>
struct BlockRakingLayout#

BlockRakingLayout provides a conflict-free shared memory layout abstraction for 1D raking across thread block data.

Overview#

This type facilitates a shared memory usage pattern where a block of CUDA threads places elements into shared memory and then reduces the active parallelism to one “raking” warp of threads for serially aggregating consecutive sequences of shared items. Padding is inserted to eliminate bank conflicts (for most data types).

Template Parameters:
  • T – The data type to be exchanged.

  • BlockThreads – The thread block size in threads.

Public Static Functions

static inline T *PlacementPtr(
TempStorage &temp_storage,
unsigned int linear_tid
)#

Returns the location for the calling thread to place data into the grid.

static inline T *RakingPtr(
TempStorage &temp_storage,
unsigned int linear_tid
)#

Returns the location for the calling thread to begin sequential raking.

Public Static Attributes

static constexpr int SHARED_ELEMENTS = BlockThreads#

The total number of elements that need to be cooperatively reduced.

static constexpr int MAX_RAKING_THREADS = ::cuda::std::min(BlockThreads, detail::warp_threads)#

Maximum number of warp-synchronous raking threads.

static constexpr int SEGMENT_LENGTH = (SHARED_ELEMENTS + MAX_RAKING_THREADS - 1) / MAX_RAKING_THREADS#

Number of raking elements per warp-synchronous raking thread (rounded up)

static constexpr int RAKING_THREADS = (SHARED_ELEMENTS + SEGMENT_LENGTH - 1) / SEGMENT_LENGTH#

Never use a raking thread that will have no valid data (e.g., when BlockThreads is 62 and SEGMENT_LENGTH is 2, we should only use 31 raking threads)

static constexpr bool HAS_CONFLICTS = (detail::smem_banks % SEGMENT_LENGTH == 0)#

Whether we will have bank conflicts (technically we should find out if the GCD is > 1)

static constexpr int CONFLICT_DEGREE = (HAS_CONFLICTS) ? (MAX_RAKING_THREADS * SEGMENT_LENGTH) / detail::smem_banks : 1#

Degree of bank conflicts (e.g., 4-way)

static constexpr bool USE_SEGMENT_PADDING = ((SEGMENT_LENGTH & 1) == 0) && (SEGMENT_LENGTH > 2)#

Pad each segment length with one element if segment length is not relatively prime to warp size and can’t be optimized as a vector load.

static constexpr int GRID_ELEMENTS = RAKING_THREADS * (SEGMENT_LENGTH + USE_SEGMENT_PADDING)#

Total number of elements in the raking grid.

static constexpr int UNGUARDED = (SHARED_ELEMENTS % RAKING_THREADS == 0)#

Whether or not we need bounds checking during raking (the number of reduction elements is not a multiple of the number of raking threads)

struct _TempStorage#

Shared memory storage type.

Public Members

T buff[BlockRakingLayout::GRID_ELEMENTS]#
struct TempStorage : public Uninitialized<_TempStorage>#

Alias wrapper allowing storage to be unioned.