cub::DeviceMergeSort
Defined in cub/device/device_merge_sort.cuh
-
struct DeviceMergeSort
DeviceMergeSort provides device-wide, parallel operations for computing a merge sort across a sequence of data items residing within device-accessible memory.
- Overview
DeviceMergeSort arranges items into ascending order using a comparison functor with less-than semantics. Merge sort can handle arbitrary types (as long as a value of these types is a model of LessThan Comparable) and comparison functors, but is slower than DeviceRadixSort when sorting arithmetic types into ascending/descending order.
Another difference from RadixSort is the fact that DeviceMergeSort can handle arbitrary random-access iterators, as shown below.
- A Simple Example
The code snippet below illustrates a thrust reverse iterator usage.
#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> struct CustomLess { template <typename DataType> __device__ bool operator()(const DataType &lhs, const DataType &rhs) { return lhs < rhs; } }; // Declare, allocate, and initialize device-accessible pointers // for sorting data thrust::device_vector<KeyType> d_keys(num_items); thrust::device_vector<DataType> d_values(num_items); // ... // Initialize iterator using KeyIterator = typename thrust::device_vector<KeyType>::iterator; thrust::reverse_iterator<KeyIterator> reverse_iter(d_keys.end()); // Determine temporary device storage requirements std::size_t temp_storage_bytes = 0; cub::DeviceMergeSort::SortPairs( nullptr, temp_storage_bytes, reverse_iter, thrust::raw_pointer_cast(d_values.data()), num_items, CustomLess()); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, reverse_iter, thrust::raw_pointer_cast(d_values.data()), num_items, CustomLess());
Public Static Functions
-
template<typename KeyIteratorT, typename ValueIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t SortPairs(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorT d_keys, ValueIteratorT d_items, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream = 0) Sorts items using a merge sorting method.
SortPairs is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Snippet
The code snippet below illustrates the sorting of a device vector of
int
keys with associated vector ofint
values.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers for // sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 6, 5, 3, 0, 9] int *d_values; // e.g., [0, 1, 2, 3, 4, 5, 6] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; std::size_t temp_storage_bytes = 0; cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 6, 8, 9] // d_values <-- [5, 4, 3, 2, 1, 0, 6]
- Template Parameters
KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorT
is mutable, and itsvalue_type
is a model of LessThan Comparable. Thisvalue_type
’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.ValueIteratorT – is a model of Random Access Iterator, and
ValueIteratorT
is mutable.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)
that models the Strict Weak Ordering concept.
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_keys – [inout] Pointer to the input sequence of unsorted input keys
d_items – [inout] Pointer to the input sequence of unsorted input values
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyInputIteratorT, typename ValueInputIteratorT, typename KeyIteratorT, typename ValueIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t SortPairsCopy(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyInputIteratorT d_input_keys, ValueInputIteratorT d_input_items, KeyIteratorT d_output_keys, ValueIteratorT d_output_items, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream = 0) Sorts items using a merge sorting method.
SortPairsCopy is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.Input arrays
d_input_keys
andd_input_items
are not modified.Note that the behavior is undefined if the input and output ranges overlap in any way.
- Snippet
The code snippet below illustrates the sorting of a device vector of
int
keys with associated vector ofint
values.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers // for sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 6, 5, 3, 0, 9] int *d_values; // e.g., [0, 1, 2, 3, 4, 5, 6] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; std::size_t temp_storage_bytes = 0; cub::DeviceMergeSort::SortPairsCopy( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::SortPairsCopy( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 6, 8, 9] // d_values <-- [5, 4, 3, 2, 1, 0, 6]
- Template Parameters
KeyInputIteratorT – is a model of Random Access Iterator. Its
value_type
is a model of LessThan Comparable. Thisvalue_type
’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.ValueInputIteratorT – is a model of Random Access Iterator.
KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorT
is mutable, and itsvalue_type
is a model of LessThan Comparable. Thisvalue_type
’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.ValueIteratorT – is a model of Random Access Iterator, and
ValueIteratorT
is mutable.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)
that models the Strict Weak Ordering concept.
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_input_keys – [in] Pointer to the input sequence of unsorted input keys
d_input_items – [in] Pointer to the input sequence of unsorted input values
d_output_keys – [out] Pointer to the output sequence of sorted input keys
d_output_items – [out] Pointer to the output sequence of sorted input values
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns
true
if the first argument is ordered before the secondstream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t SortKeys(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorT d_keys, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream = 0) Sorts items using a merge sorting method.
SortKeys is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.- Snippet
The code snippet below illustrates the sorting of a device vector of
int
keys.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers // for sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 7, 5, 3, 0, 9] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; std::size_t temp_storage_bytes = 0; cub::DeviceMergeSort::SortKeys( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::SortKeys( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 7, 8, 9]
- Template Parameters
KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorT
is mutable, and itsvalue_type
is a model of LessThan Comparable. Thisvalue_type
’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)
that models the Strict Weak Ordering concept.
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_keys – [inout] Pointer to the input sequence of unsorted input keys
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyInputIteratorT, typename KeyIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t SortKeysCopy(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyInputIteratorT d_input_keys, KeyIteratorT d_output_keys, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream = 0) Sorts items using a merge sorting method.
SortKeysCopy is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.Input array d_input_keys is not modified.
Note that the behavior is undefined if the input and output ranges overlap in any way.
- Snippet
The code snippet below illustrates the sorting of a device vector of
int
keys.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers for // sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 7, 5, 3, 0, 9] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; std::size_t temp_storage_bytes = 0; cub::DeviceMergeSort::SortKeysCopy( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::SortKeysCopy( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 7, 8, 9]
- Template Parameters
KeyInputIteratorT – is a model of Random Access Iterator. Its
value_type
is a model of LessThan Comparable. Thisvalue_type
’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorT
is mutable, and itsvalue_type
is a model of LessThan Comparable. Thisvalue_type
’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)
that models the Strict Weak Ordering concept.
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_input_keys – [in] Pointer to the input sequence of unsorted input keys
d_output_keys – [out] Pointer to the output sequence of sorted input keys
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyIteratorT, typename ValueIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t StableSortPairs(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorT d_keys, ValueIteratorT d_items, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream = 0) Sorts items using a merge sorting method.
StableSortPairs is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of stable_sort is that x still precedes y.
- Snippet
The code snippet below illustrates the sorting of a device vector of
int
keys with associated vector ofint
values.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers for // sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 6, 5, 3, 0, 9] int *d_values; // e.g., [0, 1, 2, 3, 4, 5, 6] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; std::size_t temp_storage_bytes = 0; cub::DeviceMergeSort::StableSortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::StableSortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 6, 8, 9] // d_values <-- [5, 4, 3, 1, 2, 0, 6]
- Template Parameters
KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorT
is mutable, and itsvalue_type
is a model of LessThan Comparable. Thisvalue_type
’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.ValueIteratorT – is a model of Random Access Iterator, and
ValueIteratorT
is mutable.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)
that models the Strict Weak Ordering concept.
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_keys – [inout] Pointer to the input sequence of unsorted input keys
d_items – [inout] Pointer to the input sequence of unsorted input values
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t StableSortKeys(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorT d_keys, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream = 0) Sorts items using a merge sorting method.
StableSortKeys is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of stable_sort is thatx
still precedesy
.- Snippet
The code snippet below illustrates the sorting of a device vector of
int
keys.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers for // sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 7, 5, 3, 0, 9] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; std::size_t temp_storage_bytes = 0; cub::DeviceMergeSort::StableSortKeys( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::StableSortKeys( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 7, 8, 9]
- Template Parameters
KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorT
is mutable, and itsvalue_type
is a model of LessThan Comparable. Thisvalue_type
’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)
that models the Strict Weak Ordering concept.
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_keys – [inout] Pointer to the input sequence of unsorted input keys
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyInputIteratorT, typename KeyIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t StableSortKeysCopy(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyInputIteratorT d_input_keys, KeyIteratorT d_output_keys, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream = 0) Sorts items using a merge sorting method.
StableSortKeysCopy is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of stable_sort is thatx
still precedesy
.Input array d_input_keys is not modified
Note that the behavior is undefined if the input and output ranges overlap in any way.
- Snippet
The code snippet below illustrates the sorting of a device vector of
int
keys.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers for // sorting data int num_items; // e.g., 7 int *d_input_keys; // e.g., [8, 6, 7, 5, 3, 0, 9] int *d_output_keys; // must hold at least num_items elements ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; std::size_t temp_storage_bytes = 0; cub::DeviceMergeSort::StableSortKeysCopy( d_temp_storage, temp_storage_bytes, d_input_keys, d_output_keys, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::StableSortKeysCopy( d_temp_storage, temp_storage_bytes, d_input_keys, d_output_keys, num_items, custom_op); // d_output_keys <-- [0, 3, 5, 6, 7, 8, 9]
- Template Parameters
KeyInputIteratorT – is a model of Random Access Iterator. Its
value_type
is a model of LessThan Comparable. Thisvalue_type
’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorT
is mutable, and itsvalue_type
is a model of LessThan Comparable. Thisvalue_type
’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)
that models the Strict Weak Ordering concept.
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_input_keys – [in] Pointer to the input sequence of unsorted input keys
d_output_keys – [out] Pointer to the output sequence of sorted input keys
num_items – [in] Number of elements in d_input_keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.