cub::WarpMergeSort

Defined in /home/runner/work/cccl/cccl/cub/cub/warp/warp_merge_sort.cuh

template<typename KeyT, int ITEMS_PER_THREAD, int LOGICAL_WARP_THREADS = CUB_WARP_THREADS(0), typename ValueT = NullType, int LEGACY_PTX_ARCH = 0>
class WarpMergeSort : public cub::BlockMergeSortStrategy<KeyT, NullType, CUB_WARP_THREADS(0), ITEMS_PER_THREAD, WarpMergeSort<KeyT, ITEMS_PER_THREAD, CUB_WARP_THREADS(0), NullType>>

The WarpMergeSort class provides methods for sorting items partitioned across a CUDA warp using a merge sorting method.

Overview

WarpMergeSort arranges items into ascending order using a comparison functor with less-than semantics. Merge sort can handle arbitrary types and comparison functors.

A Simple Example

The code snippet below illustrates a sort of 64 integer keys that are partitioned across 16 threads where each thread owns 4 consecutive items.

#include <cub/cub.cuh>  // or equivalently <cub/warp/warp_merge_sort.cuh>

struct CustomLess
{
  template <typename DataType>
  __device__ bool operator()(const DataType &lhs, const DataType &rhs)
  {
    return lhs < rhs;
  }
};

__global__ void ExampleKernel(...)
{
    constexpr int warp_threads = 16;
    constexpr int block_threads = 256;
    constexpr int items_per_thread = 4;
    constexpr int warps_per_block = block_threads / warp_threads;
    const int warp_id = static_cast<int>(threadIdx.x) / warp_threads;

    // Specialize WarpMergeSort for a virtual warp of 16 threads
    // owning 4 integer items each
    using WarpMergeSortT =
      cub::WarpMergeSort<int, items_per_thread, warp_threads>;

    // Allocate shared memory for WarpMergeSort
    __shared__ typename WarpMergeSortT::TempStorage temp_storage[warps_per_block];

    // Obtain a segment of consecutive items that are blocked across threads
    int thread_keys[items_per_thread];
    // ...

    WarpMergeSortT(temp_storage[warp_id]).Sort(thread_keys, CustomLess());
    // ...
}

Suppose the set of input thread_keys across a warp of threads is { [0,64,1,63], [2,62,3,61], [4,60,5,59], ..., [31,34,32,33] }. The corresponding output thread_keys in those threads will be { [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [31,32,33,34] }.

Template Parameters
  • KeyT – Key type

  • ITEMS_PER_THREAD – The number of items per thread

  • LOGICAL_WARP_THREADS[optional] The number of threads per “logical” warp (may be less than the number of hardware warp threads). Default is the warp size of the targeted CUDA compute-capability (e.g., 32 threads for SM86). Must be a power of two.

  • ValueT[optional] Value type (default: cub::NullType, which indicates a keys-only sort)

  • LEGACY_PTX_ARCH – Unused.

Public Functions

WarpMergeSort() = delete
inline WarpMergeSort(typename BlockMergeSortStrategyT::TempStorage &temp_storage)
inline unsigned int get_member_mask() const
inline unsigned int get_linear_tid() const
inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op)

Sorts items partitioned across a CUDA thread block using a merge sorting method.

Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.

Template Parameters

CompareOp – functor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.

Parameters
  • keys[inout] Keys to sort

  • compare_op[in] Comparison function object which returns true if the first argument is ordered before the second

inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)

Sorts items partitioned across a CUDA thread block using a merge sorting method.

  • Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.

  • The value of oob_default is assigned to all elements that are out of valid_items boundaries. It’s expected that oob_default is ordered after any value in the valid_items boundaries. The algorithm always sorts a fixed amount of elements, which is equal to ITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered after oob_default, it won’t be placed within valid_items boundaries.

Template Parameters

CompareOp – functor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.

Parameters
  • keys[inout] Keys to sort

  • compare_op[in] Comparison function object which returns true if the first argument is ordered before the second

  • valid_items[in] Number of valid items to sort

  • oob_default[in] Default value to assign out-of-bound items

inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], NullType (&items)[ITEMS_PER_THREAD], CompareOp compare_op)

Sorts items partitioned across a CUDA thread block using a merge sorting method.

Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.

Template Parameters

CompareOp – functor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.

Parameters
  • keys[inout] Keys to sort

  • items[inout] Values to sort

  • compare_op[in] Comparison function object which returns true if the first argument is ordered before the second

inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], NullType (&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)

Sorts items partitioned across a CUDA thread block using a merge sorting method.

  • Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.

  • The value of oob_default is assigned to all elements that are out of valid_items boundaries. It’s expected that oob_default is ordered after any value in the valid_items boundaries. The algorithm always sorts a fixed amount of elements, which is equal to ITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered after oob_default, it won’t be placed within valid_items boundaries.

Template Parameters
  • CompareOp – functor type having member bool operator()(KeyT lhs, KeyT rhs) CompareOp is a model of Strict Weak Ordering.

  • IS_LAST_TILE – True if valid_items isn’t equal to the ITEMS_PER_TILE

Parameters
  • keys[inout] Keys to sort

  • items[inout] Values to sort

  • compare_op[in] Comparison function object which returns true if the first argument is ordered before the second

  • valid_items[in] Number of valid items to sort

  • oob_default[in] Default value to assign out-of-bound items

inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op)

Sorts items partitioned across a CUDA thread block using a merge sorting method.

StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of StableSort is that x still precedes y.

Template Parameters

CompareOp – functor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.

Parameters
  • keys[inout] Keys to sort

  • compare_op[in] Comparison function object which returns true if the first argument is ordered before the second

inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], NullType (&items)[ITEMS_PER_THREAD], CompareOp compare_op)

Sorts items partitioned across a CUDA thread block using a merge sorting method.

StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of StableSort is that x still precedes y.

Template Parameters

CompareOp – functor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.

Parameters
  • keys[inout] Keys to sort

  • items[inout] Values to sort

  • compare_op[in] Comparison function object which returns true if the first argument is ordered before the second

inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)

Sorts items partitioned across a CUDA thread block using a merge sorting method.

  • StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of StableSort is that x still precedes y.

  • The value of oob_default is assigned to all elements that are out of valid_items boundaries. It’s expected that oob_default is ordered after any value in the valid_items boundaries. The algorithm always sorts a fixed amount of elements, which is equal to ITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered after oob_default, it won’t be placed within valid_items boundaries.

Template Parameters

CompareOp – functor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.

Parameters
  • keys[inout] Keys to sort

  • compare_op[in] Comparison function object which returns true if the first argument is ordered before the second

  • valid_items[in] Number of valid items to sort

  • oob_default[in] Default value to assign out-of-bound items

inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], NullType (&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)

Sorts items partitioned across a CUDA thread block using a merge sorting method.

  • StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of StableSort is that x still precedes y.

  • The value of oob_default is assigned to all elements that are out of valid_items boundaries. It’s expected that oob_default is ordered after any value in the valid_items boundaries. The algorithm always sorts a fixed amount of elements, which is equal to ITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered after oob_default, it won’t be placed within valid_items boundaries.

Template Parameters
  • CompareOp – functor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.

  • IS_LAST_TILE – True if valid_items isn’t equal to the ITEMS_PER_TILE

Parameters
  • keys[inout] Keys to sort

  • items[inout] Values to sort

  • compare_op[in] Comparison function object which returns true if the first argument is ordered before the second

  • valid_items[in] Number of valid items to sort

  • oob_default[in] Default value to assign out-of-bound items