cub::BlockMergeSort
Defined in cub/block/block_merge_sort.cuh
-
template<typename KeyT, int BLOCK_DIM_X, int ITEMS_PER_THREAD, typename ValueT = NullType, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1>
class BlockMergeSort : public cub::BlockMergeSortStrategy<KeyT, NullType, BLOCK_DIM_X * 1 * 1, ITEMS_PER_THREAD, BlockMergeSort<KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, NullType, 1, 1>> The BlockMergeSort class provides methods for sorting items partitioned across a CUDA thread block using a merge sorting method.
This example can be easily adapted to the storage required by BlockMergeSort.
- Overview
BlockMergeSort arranges items into ascending order using a comparison functor with less-than semantics. Merge sort can handle arbitrary types and comparison functors, but is slower than BlockRadixSort when sorting arithmetic types into ascending/descending order.
- A Simple Example
Every thread in the block uses the BlockMergeSort class by first specializing the BlockMergeSort type, then instantiating an instance with parameters for communication, and finally invoking one or more collective member functions.
The code snippet below illustrates a sort of 512 integer keys that are partitioned across 128 threads * where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/block/block_merge_sort.cuh> struct CustomLess { template <typename DataType> __device__ bool operator()(const DataType &lhs, const DataType &rhs) { return lhs < rhs; } }; __global__ void ExampleKernel(...) { // Specialize BlockMergeSort for a 1D block of 128 threads owning 4 integer items each using BlockMergeSort = cub::BlockMergeSort<int, 128, 4>; // Allocate shared memory for BlockMergeSort __shared__ typename BlockMergeSort::TempStorage temp_storage_shuffle; // Obtain a segment of consecutive items that are blocked across threads int thread_keys[4]; ... BlockMergeSort(temp_storage_shuffle).Sort(thread_keys, CustomLess()); ... }
Suppose the set of input
thread_keys
across the block of threads is{ [0,511,1,510], [2,509,3,508], [4,507,5,506], ..., [254,257,255,256] }
. The corresponding outputthread_keys
in those threads will be{ [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [508,509,510,511] }
.- Re-using dynamically allocating shared memory
The
block/example_block_reduce_dyn_smem.cu
example illustrates usage of dynamically shared memory with BlockReduce and how to re-purpose the same memory region.
- Template Parameters
KeyT – KeyT type
BLOCK_DIM_X – The thread block length in threads along the X dimension
ITEMS_PER_THREAD – The number of items per thread
ValueT – [optional] ValueT type (default:
cub::NullType
, which indicates a keys-only sort)BLOCK_DIM_Y – [optional] The thread block length in threads along the Y dimension (default: 1)
BLOCK_DIM_Z – [optional] The thread block length in threads along the Z dimension (default: 1)
Public Functions
-
inline BlockMergeSort()
-
inline explicit BlockMergeSort(typename BlockMergeSortStrategyT::TempStorage &temp_storage)
-
inline unsigned int get_linear_tid() const
-
inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op)
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It’s expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won’t be placed withinvalid_items
boundaries.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items
-
inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], NullType (&items)[ITEMS_PER_THREAD], CompareOp compare_op)
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], NullType (&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It’s expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won’t be placed withinvalid_items
boundaries.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
CompareOp
is a model of Strict Weak Ordering.IS_LAST_TILE – True if
valid_items
isn’t equal to theITEMS_PER_TILE
- Parameters
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items
-
inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op)
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], NullType (&items)[ITEMS_PER_THREAD], CompareOp compare_op)
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It’s expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won’t be placed withinvalid_items
boundaries.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items
-
inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], NullType (&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It’s expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won’t be placed withinvalid_items
boundaries.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.IS_LAST_TILE – True if
valid_items
isn’t equal to theITEMS_PER_TILE
- Parameters
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items