cub::WarpMergeSort#
-
template<typename KeyT, int ITEMS_PER_THREAD, int LOGICAL_WARP_THREADS = detail::warp_threads, typename ValueT = NullType>
class WarpMergeSort : public cub::BlockMergeSortStrategy<KeyT, NullType, detail::warp_threads, ITEMS_PER_THREAD, WarpMergeSort<KeyT, ITEMS_PER_THREAD, detail::warp_threads, NullType>># The WarpMergeSort class provides methods for sorting items partitioned across a CUDA warp using a merge sorting method.
Overview#
WarpMergeSort arranges items into ascending order using a comparison functor with less-than semantics. Merge sort can handle arbitrary types and comparison functors.
A Simple Example#
The code snippet below illustrates a sort of 64 integer keys that are partitioned across 16 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/warp/warp_merge_sort.cuh> struct CustomLess { template <typename DataType> __device__ bool operator()(const DataType &lhs, const DataType &rhs) { return lhs < rhs; } }; __global__ void ExampleKernel(...) { constexpr int warp_threads = 16; constexpr int block_threads = 256; constexpr int items_per_thread = 4; constexpr int warps_per_block = block_threads / warp_threads; const int warp_id = static_cast<int>(threadIdx.x) / warp_threads; // Specialize WarpMergeSort for a virtual warp of 16 threads // owning 4 integer items each using WarpMergeSortT = cub::WarpMergeSort<int, items_per_thread, warp_threads>; // Allocate shared memory for WarpMergeSort __shared__ typename WarpMergeSortT::TempStorage temp_storage[warps_per_block]; // Obtain a segment of consecutive items that are blocked across threads int thread_keys[items_per_thread]; // ... WarpMergeSortT(temp_storage[warp_id]).Sort(thread_keys, CustomLess()); // ... }
Suppose the set of input
thread_keys
across a warp of threads is{ [0,64,1,63], [2,62,3,61], [4,60,5,59], ..., [31,34,32,33] }
. The corresponding outputthread_keys
in those threads will be{ [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [31,32,33,34] }
.- Template Parameters:
KeyT – Key type
ITEMS_PER_THREAD – The number of items per thread
LOGICAL_WARP_THREADS – [optional] The number of threads per “logical” warp (may be less than the number of hardware warp threads). Default is the warp size of the targeted CUDA compute-capability (e.g., 32 threads for SM86). Must be a power of two.
ValueT – [optional] Value type (default: cub::NullType, which indicates a keys-only sort)
Public Functions
-
WarpMergeSort() = delete#
- inline WarpMergeSort(
- typename BlockMergeSortStrategyT::TempStorage &temp_storage,
-
inline unsigned int get_member_mask() const#
-
inline unsigned int get_linear_tid() const#
-
template<typename CompareOp>
inline void Sort( - KeyT (&keys)[ITEMS_PER_THREAD],
- CompareOp compare_op,
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Template Parameters:
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters:
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
template<typename CompareOp>
inline void Sort( - KeyT (&keys)[ITEMS_PER_THREAD],
- CompareOp compare_op,
- int valid_items,
- KeyT oob_default,
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It’s expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won’t be placed withinvalid_items
boundaries.
- Template Parameters:
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters:
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items
-
template<typename CompareOp>
inline void Sort( - KeyT (&keys)[ITEMS_PER_THREAD],
- ValueT (&items)[ITEMS_PER_THREAD],
- CompareOp compare_op,
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Template Parameters:
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters:
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
template<typename CompareOp, bool IS_LAST_TILE = true>
inline void Sort( - KeyT (&keys)[ITEMS_PER_THREAD],
- ValueT (&items)[ITEMS_PER_THREAD],
- CompareOp compare_op,
- int valid_items,
- KeyT oob_default,
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It’s expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won’t be placed withinvalid_items
boundaries.
- Template Parameters:
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
CompareOp
is a model of Strict Weak Ordering.IS_LAST_TILE – True if
valid_items
isn’t equal to theITEMS_PER_TILE
- Parameters:
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items
-
template<typename CompareOp>
inline void StableSort( - KeyT (&keys)[ITEMS_PER_THREAD],
- CompareOp compare_op,
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.
- Template Parameters:
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters:
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
template<typename CompareOp>
inline void StableSort( - KeyT (&keys)[ITEMS_PER_THREAD],
- ValueT (&items)[ITEMS_PER_THREAD],
- CompareOp compare_op,
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.
- Template Parameters:
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters:
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
template<typename CompareOp>
inline void StableSort( - KeyT (&keys)[ITEMS_PER_THREAD],
- CompareOp compare_op,
- int valid_items,
- KeyT oob_default,
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It’s expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won’t be placed withinvalid_items
boundaries.
- Template Parameters:
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters:
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items
-
template<typename CompareOp, bool IS_LAST_TILE = true>
inline void StableSort( - KeyT (&keys)[ITEMS_PER_THREAD],
- ValueT (&items)[ITEMS_PER_THREAD],
- CompareOp compare_op,
- int valid_items,
- KeyT oob_default,
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It’s expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won’t be placed withinvalid_items
boundaries.
- Template Parameters:
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.IS_LAST_TILE – True if
valid_items
isn’t equal to theITEMS_PER_TILE
- Parameters:
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items