Block-Wide “Collective” Primitives

CUB block-level algorithms are specialized for execution by threads in the same CUDA thread block: