CUB Modules
CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming model:
Parallel primitives
Warp-wide “collective” primitives
Cooperative warp-wide prefix scan, reduction, etc.
Safely specialized for each underlying CUDA architecture
Block-wide “collective” primitives
Cooperative I/O, sort, scan, reduction, histogram, etc.
Compatible with arbitrary thread block sizes and types
Device-wide primitives
Parallel sort, prefix scan, reduction, histogram, etc.
Compatible with CUDA dynamic parallelism