cub::DeviceMergeSort#
-
struct DeviceMergeSort#
DeviceMergeSort provides device-wide, parallel operations for computing a merge sort across a sequence of data items residing within device-accessible memory.
- Overview
DeviceMergeSort arranges items into ascending order using a comparison functor with less-than semantics. Merge sort can handle arbitrary types (as long as a value of these types is a model of LessThan Comparable) and comparison functors, but is slower than DeviceRadixSort when sorting arithmetic types into ascending/descending order.
Another difference from RadixSort is the fact that DeviceMergeSort can handle arbitrary random-access iterators, as shown below.
- A Simple Example
The code snippet below illustrates a thrust reverse iterator usage.
#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> struct CustomLess { template <typename DataType> __device__ bool operator()(const DataType &lhs, const DataType &rhs) { return lhs < rhs; } }; // Declare, allocate, and initialize device-accessible pointers // for sorting data thrust::device_vector<KeyType> d_keys(num_items); thrust::device_vector<DataType> d_values(num_items); // ... // Initialize iterator using KeyIterator = typename thrust::device_vector<KeyType>::iterator; cuda::std::reverse_iterator<KeyIterator> reverse_iter(d_keys.end()); // Determine temporary device storage requirements size_t temp_storage_bytes = 0; cub::DeviceMergeSort::SortPairs( nullptr, temp_storage_bytes, reverse_iter, thrust::raw_pointer_cast(d_values.data()), num_items, CustomLess()); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, reverse_iter, thrust::raw_pointer_cast(d_values.data()), num_items, CustomLess());
Public Static Functions
-
template<typename KeyIteratorT, typename ValueIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t SortPairs( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyIteratorT d_keys,
- ValueIteratorT d_items,
- OffsetT num_items,
- CompareOpT compare_op,
- cudaStream_t stream = 0
Sorts items using a merge sorting method.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
SortPairs is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Snippet
The code snippet below illustrates the sorting of a device vector of
intkeys with associated vector ofintvalues.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers for // sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 6, 5, 3, 0, 9] int *d_values; // e.g., [0, 1, 2, 3, 4, 5, 6] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 6, 8, 9] // d_values <-- [5, 4, 3, 2, 1, 0, 6]
- Template Parameters:
KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorTis mutable, and itsvalue_typeis a model of LessThan Comparable. Thisvalue_type’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.ValueIteratorT – is a model of Random Access Iterator, and
ValueIteratorTis mutable.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)that models the Strict Weak Ordering concept.
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_keys – [inout] Pointer to the input sequence of unsorted input keys
d_items – [inout] Pointer to the input sequence of unsorted input values
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyIteratorT, typename ValueIteratorT, typename OffsetT, typename CompareOpT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t SortPairs( - KeyIteratorT d_keys,
- ValueIteratorT d_items,
- OffsetT num_items,
- CompareOpT compare_op,
- EnvT env = {}
Sorts items using a merge sorting method.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceSortPairs is not guaranteed to be stable.
Snippet#
auto d_keys = thrust::device_vector<int>{8, 6, 7, 5, 3, 0, 9}; auto d_values = thrust::device_vector<int>{0, 1, 2, 3, 4, 5, 6}; auto error = cub::DeviceMergeSort::SortPairs( d_keys.data().get(), d_values.data().get(), static_cast<int>(d_keys.size()), cuda::std::less<int>{}); if (error != cudaSuccess) { std::cerr << "cub::DeviceMergeSort::SortPairs failed with status: " << error << std::endl; } thrust::device_vector<int> expected_keys{0, 3, 5, 6, 7, 8, 9}; thrust::device_vector<int> expected_values{5, 4, 3, 1, 2, 0, 6};
- Template Parameters:
KeyIteratorT – [inferred] Random-access iterator type for keys (may be a simple pointer type)
ValueIteratorT – [inferred] Random-access iterator type for values (may be a simple pointer type)
OffsetT – [inferred] Integer type for offsets
CompareOpT – [inferred] Comparison function object type
EnvT – [inferred] Execution environment type
- Parameters:
d_keys – [inout] Keys to sort
d_items – [inout] Values corresponding to keys
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<typename KeyInputIteratorT, typename ValueInputIteratorT, typename KeyIteratorT, typename ValueIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t SortPairsCopy( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_input_keys,
- ValueInputIteratorT d_input_items,
- KeyIteratorT d_output_keys,
- ValueIteratorT d_output_items,
- OffsetT num_items,
- CompareOpT compare_op,
- cudaStream_t stream = 0
Sorts items using a merge sorting method.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
SortPairsCopy is not guaranteed to be stable. That is, suppose that
iandjare equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.Input arrays
d_input_keysandd_input_itemsare not modified.Note that the behavior is undefined if the input and output ranges overlap in any way.
- Snippet
The code snippet below illustrates the sorting of a device vector of
intkeys with associated vector ofintvalues.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers // for sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 6, 5, 3, 0, 9] int *d_values; // e.g., [0, 1, 2, 3, 4, 5, 6] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceMergeSort::SortPairsCopy( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::SortPairsCopy( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 6, 8, 9] // d_values <-- [5, 4, 3, 2, 1, 0, 6]
- Template Parameters:
KeyInputIteratorT – is a model of Random Access Iterator. Its
value_typeis a model of LessThan Comparable. Thisvalue_type’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.ValueInputIteratorT – is a model of Random Access Iterator.
KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorTis mutable, and itsvalue_typeis a model of LessThan Comparable. Thisvalue_type’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.ValueIteratorT – is a model of Random Access Iterator, and
ValueIteratorTis mutable.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)that models the Strict Weak Ordering concept.
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_input_keys – [in] Pointer to the input sequence of unsorted input keys
d_input_items – [in] Pointer to the input sequence of unsorted input values
d_output_keys – [out] Pointer to the output sequence of sorted input keys
d_output_items – [out] Pointer to the output sequence of sorted input values
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns
trueif the first argument is ordered before the secondstream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyInputIteratorT, typename ValueInputIteratorT, typename KeyIteratorT, typename ValueIteratorT, typename OffsetT, typename CompareOpT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t SortPairsCopy( - KeyInputIteratorT d_input_keys,
- ValueInputIteratorT d_input_items,
- KeyIteratorT d_output_keys,
- ValueIteratorT d_output_items,
- OffsetT num_items,
- CompareOpT compare_op,
- EnvT env = {}
Sorts items using a merge sorting method.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceSortPairsCopy is not guaranteed to be stable.
Input arrays
d_input_keysandd_input_itemsare not modified.The behavior is undefined if the input and output ranges overlap in any way.
Snippet#
auto d_keys_in = thrust::device_vector<int>{8, 6, 7, 5, 3, 0, 9}; auto d_values_in = thrust::device_vector<int>{0, 1, 2, 3, 4, 5, 6}; auto d_keys_out = thrust::device_vector<int>(7); auto d_values_out = thrust::device_vector<int>(7); cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto error = cub::DeviceMergeSort::SortPairsCopy( d_keys_in.data().get(), d_values_in.data().get(), d_keys_out.data().get(), d_values_out.data().get(), static_cast<int>(d_keys_in.size()), cuda::std::less<int>{}, stream_ref); if (error != cudaSuccess) { std::cerr << "cub::DeviceMergeSort::SortPairsCopy failed with status: " << error << std::endl; } thrust::device_vector<int> expected_keys{0, 3, 5, 6, 7, 8, 9}; thrust::device_vector<int> expected_values{5, 4, 3, 1, 2, 0, 6};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access iterator type for input keys (may be a simple pointer type)
ValueInputIteratorT – [inferred] Random-access iterator type for input values (may be a simple pointer type)
KeyIteratorT – [inferred] Random-access iterator type for output keys (may be a simple pointer type)
ValueIteratorT – [inferred] Random-access iterator type for output values (may be a simple pointer type)
OffsetT – [inferred] Integer type for offsets
CompareOpT – [inferred] Comparison function object type
EnvT – [inferred] Execution environment type
- Parameters:
d_input_keys – [in] Pointer to the input sequence of unsorted input keys
d_input_items – [in] Pointer to the input sequence of unsorted input values
d_output_keys – [out] Pointer to the output sequence of sorted input keys
d_output_items – [out] Pointer to the output sequence of sorted input values
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<typename KeyIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t SortKeys( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyIteratorT d_keys,
- OffsetT num_items,
- CompareOpT compare_op,
- cudaStream_t stream = 0
Sorts items using a merge sorting method.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
SortKeys is not guaranteed to be stable. That is, suppose that
iandjare equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.- Snippet
The code snippet below illustrates the sorting of a device vector of
intkeys.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers // for sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 7, 5, 3, 0, 9] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceMergeSort::SortKeys( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::SortKeys( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 7, 8, 9]
- Template Parameters:
KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorTis mutable, and itsvalue_typeis a model of LessThan Comparable. Thisvalue_type’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)that models the Strict Weak Ordering concept.
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_keys – [inout] Pointer to the input sequence of unsorted input keys
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyIteratorT, typename OffsetT, typename CompareOpT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t SortKeys( - KeyIteratorT d_keys,
- OffsetT num_items,
- CompareOpT compare_op,
- EnvT env = {}
Sorts keys using a merge sorting method.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceSortKeys is not guaranteed to be stable.
Snippet#
auto d_keys = thrust::device_vector<int>{8, 6, 7, 5, 3, 0, 9}; auto error = cub::DeviceMergeSort::SortKeys(d_keys.data().get(), static_cast<int>(d_keys.size()), cuda::std::less<int>{}); if (error != cudaSuccess) { std::cerr << "cub::DeviceMergeSort::SortKeys failed with status: " << error << std::endl; } thrust::device_vector<int> expected_keys{0, 3, 5, 6, 7, 8, 9};
- Template Parameters:
KeyIteratorT – [inferred] Random-access iterator type for keys (may be a simple pointer type)
OffsetT – [inferred] Integer type for offsets
CompareOpT – [inferred] Comparison function object type
EnvT – [inferred] Execution environment type
- Parameters:
d_keys – [inout] Keys to sort
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<typename KeyInputIteratorT, typename KeyIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t SortKeysCopy( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_input_keys,
- KeyIteratorT d_output_keys,
- OffsetT num_items,
- CompareOpT compare_op,
- cudaStream_t stream = 0
Sorts items using a merge sorting method.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
SortKeysCopy is not guaranteed to be stable. That is, suppose that
iandjare equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.Input array d_input_keys is not modified.
Note that the behavior is undefined if the input and output ranges overlap in any way.
- Snippet
The code snippet below illustrates the sorting of a device vector of
intkeys.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers for // sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 7, 5, 3, 0, 9] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceMergeSort::SortKeysCopy( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::SortKeysCopy( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 7, 8, 9]
- Template Parameters:
KeyInputIteratorT – is a model of Random Access Iterator. Its
value_typeis a model of LessThan Comparable. Thisvalue_type’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorTis mutable, and itsvalue_typeis a model of LessThan Comparable. Thisvalue_type’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)that models the Strict Weak Ordering concept.
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_input_keys – [in] Pointer to the input sequence of unsorted input keys
d_output_keys – [out] Pointer to the output sequence of sorted input keys
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyInputIteratorT, typename KeyIteratorT, typename OffsetT, typename CompareOpT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t SortKeysCopy( - KeyInputIteratorT d_input_keys,
- KeyIteratorT d_output_keys,
- OffsetT num_items,
- CompareOpT compare_op,
- EnvT env = {}
Sorts keys using a merge sorting method.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceSortKeysCopy is not guaranteed to be stable.
Input array
d_input_keysis not modified.The behavior is undefined if the input and output ranges overlap in any way.
Snippet#
auto d_keys_in = thrust::device_vector<int>{8, 6, 7, 5, 3, 0, 9}; auto d_keys_out = thrust::device_vector<int>(7); cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto error = cub::DeviceMergeSort::SortKeysCopy( d_keys_in.data().get(), d_keys_out.data().get(), static_cast<int>(d_keys_in.size()), cuda::std::less<int>{}, stream_ref); if (error != cudaSuccess) { std::cerr << "cub::DeviceMergeSort::SortKeysCopy failed with status: " << error << std::endl; } thrust::device_vector<int> expected_keys{0, 3, 5, 6, 7, 8, 9};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access iterator type for input keys (may be a simple pointer type)
KeyIteratorT – [inferred] Random-access iterator type for output keys (may be a simple pointer type)
OffsetT – [inferred] Integer type for offsets
CompareOpT – [inferred] Comparison function object type
EnvT – [inferred] Execution environment type
- Parameters:
d_input_keys – [in] Pointer to the input sequence of unsorted input keys
d_output_keys – [out] Pointer to the output sequence of sorted input keys
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<typename KeyIteratorT, typename ValueIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t StableSortPairs( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyIteratorT d_keys,
- ValueIteratorT d_items,
- OffsetT num_items,
- CompareOpT compare_op,
- cudaStream_t stream = 0
Sorts items using a merge sorting method.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
StableSortPairs is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of stable_sort is that x still precedes y.
- Snippet
The code snippet below illustrates the sorting of a device vector of
intkeys with associated vector ofintvalues.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers for // sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 6, 5, 3, 0, 9] int *d_values; // e.g., [0, 1, 2, 3, 4, 5, 6] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceMergeSort::StableSortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::StableSortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_values, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 6, 8, 9] // d_values <-- [5, 4, 3, 1, 2, 0, 6]
- Template Parameters:
KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorTis mutable, and itsvalue_typeis a model of LessThan Comparable. Thisvalue_type’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.ValueIteratorT – is a model of Random Access Iterator, and
ValueIteratorTis mutable.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)that models the Strict Weak Ordering concept.
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_keys – [inout] Pointer to the input sequence of unsorted input keys
d_items – [inout] Pointer to the input sequence of unsorted input values
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyIteratorT, typename ValueIteratorT, typename OffsetT, typename CompareOpT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t StableSortPairs( - KeyIteratorT d_keys,
- ValueIteratorT d_items,
- OffsetT num_items,
- CompareOpT compare_op,
- EnvT env = {}
Stably sorts items using a merge sorting method.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceStableSortPairs preserves the relative ordering of equivalent elements.
Snippet#
auto d_keys = thrust::device_vector<int>{8, 6, 6, 5, 3, 0, 9}; auto d_values = thrust::device_vector<int>{0, 1, 2, 3, 4, 5, 6}; auto error = cub::DeviceMergeSort::StableSortPairs( d_keys.data().get(), d_values.data().get(), static_cast<int>(d_keys.size()), cuda::std::less<int>{}); if (error != cudaSuccess) { std::cerr << "cub::DeviceMergeSort::StableSortPairs failed with status: " << error << std::endl; } thrust::device_vector<int> expected_keys{0, 3, 5, 6, 6, 8, 9}; thrust::device_vector<int> expected_values{5, 4, 3, 1, 2, 0, 6};
- Template Parameters:
KeyIteratorT – [inferred] Random-access iterator type for keys (may be a simple pointer type)
ValueIteratorT – [inferred] Random-access iterator type for values (may be a simple pointer type)
OffsetT – [inferred] Integer type for offsets
CompareOpT – [inferred] Comparison function object type
EnvT – [inferred] Execution environment type
- Parameters:
d_keys – [inout] Keys to sort
d_items – [inout] Values corresponding to keys
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<typename KeyIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t StableSortKeys( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyIteratorT d_keys,
- OffsetT num_items,
- CompareOpT compare_op,
- cudaStream_t stream = 0
Sorts items using a merge sorting method.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
StableSortKeys is stable: it preserves the relative ordering of equivalent elements. That is, if
xandyare elements such thatxprecedesy, and if the two elements are equivalent (neitherx < ynory < x) then a postcondition of stable_sort is thatxstill precedesy.- Snippet
The code snippet below illustrates the sorting of a device vector of
intkeys.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers for // sorting data int num_items; // e.g., 7 int *d_keys; // e.g., [8, 6, 7, 5, 3, 0, 9] ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceMergeSort::StableSortKeys( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::StableSortKeys( d_temp_storage, temp_storage_bytes, d_keys, num_items, custom_op); // d_keys <-- [0, 3, 5, 6, 7, 8, 9]
- Template Parameters:
KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorTis mutable, and itsvalue_typeis a model of LessThan Comparable. Thisvalue_type’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)that models the Strict Weak Ordering concept.
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_keys – [inout] Pointer to the input sequence of unsorted input keys
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyIteratorT, typename OffsetT, typename CompareOpT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t StableSortKeys( - KeyIteratorT d_keys,
- OffsetT num_items,
- CompareOpT compare_op,
- EnvT env = {}
Stably sorts keys using a merge sorting method.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceStableSortKeys preserves the relative ordering of equivalent elements.
Snippet#
auto d_keys = thrust::device_vector<int>{8, 6, 7, 5, 3, 0, 9}; auto error = cub::DeviceMergeSort::StableSortKeys(d_keys.data().get(), static_cast<int>(d_keys.size()), cuda::std::less<int>{}); if (error != cudaSuccess) { std::cerr << "cub::DeviceMergeSort::StableSortKeys failed with status: " << error << std::endl; } thrust::device_vector<int> expected_keys{0, 3, 5, 6, 7, 8, 9};
- Template Parameters:
KeyIteratorT – [inferred] Random-access iterator type for keys (may be a simple pointer type)
OffsetT – [inferred] Integer type for offsets
CompareOpT – [inferred] Comparison function object type
EnvT – [inferred] Execution environment type
- Parameters:
d_keys – [inout] Keys to sort
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<typename KeyInputIteratorT, typename KeyIteratorT, typename OffsetT, typename CompareOpT>
static inline cudaError_t StableSortKeysCopy( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_input_keys,
- KeyIteratorT d_output_keys,
- OffsetT num_items,
- CompareOpT compare_op,
- cudaStream_t stream = 0
Sorts items using a merge sorting method.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
StableSortKeysCopy is stable: it preserves the relative ordering of equivalent elements. That is, if
xandyare elements such thatxprecedesy, and if the two elements are equivalent (neitherx < ynory < x) then a postcondition of stable_sort is thatxstill precedesy.Input array d_input_keys is not modified
Note that the behavior is undefined if the input and output ranges overlap in any way.
- Snippet
The code snippet below illustrates the sorting of a device vector of
intkeys.#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh> // Declare, allocate, and initialize device-accessible pointers for // sorting data int num_items; // e.g., 7 int *d_input_keys; // e.g., [8, 6, 7, 5, 3, 0, 9] int *d_output_keys; // must hold at least num_items elements ... // Initialize comparator CustomOpT custom_op; // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceMergeSort::StableSortKeysCopy( d_temp_storage, temp_storage_bytes, d_input_keys, d_output_keys, num_items, custom_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run sorting operation cub::DeviceMergeSort::StableSortKeysCopy( d_temp_storage, temp_storage_bytes, d_input_keys, d_output_keys, num_items, custom_op); // d_output_keys <-- [0, 3, 5, 6, 7, 8, 9]
- Template Parameters:
KeyInputIteratorT – is a model of Random Access Iterator. Its
value_typeis a model of LessThan Comparable. Thisvalue_type’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.KeyIteratorT – is a model of Random Access Iterator.
KeyIteratorTis mutable, and itsvalue_typeis a model of LessThan Comparable. Thisvalue_type’s ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.OffsetT – is an integer type for global offsets.
CompareOpT – is a type of callable object with the signature
bool operator()(KeyT lhs, KeyT rhs)that models the Strict Weak Ordering concept.
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_input_keys – [in] Pointer to the input sequence of unsorted input keys
d_output_keys – [out] Pointer to the output sequence of sorted input keys
num_items – [in] Number of elements in d_input_keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
stream – [in] [optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyInputIteratorT, typename KeyIteratorT, typename OffsetT, typename CompareOpT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t StableSortKeysCopy( - KeyInputIteratorT d_input_keys,
- KeyIteratorT d_output_keys,
- OffsetT num_items,
- CompareOpT compare_op,
- EnvT env = {}
Stably sorts keys using a merge sorting method.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceStableSortKeysCopy preserves the relative ordering of equivalent elements.
Input array
d_input_keysis not modified.The behavior is undefined if the input and output ranges overlap in any way.
Snippet#
auto d_keys_in = thrust::device_vector<int>{8, 6, 7, 5, 3, 0, 9}; auto d_keys_out = thrust::device_vector<int>(7); cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto error = cub::DeviceMergeSort::StableSortKeysCopy( d_keys_in.data().get(), d_keys_out.data().get(), static_cast<int>(d_keys_in.size()), cuda::std::less<int>{}, stream_ref); if (error != cudaSuccess) { std::cerr << "cub::DeviceMergeSort::StableSortKeysCopy failed with status: " << error << std::endl; } thrust::device_vector<int> expected_keys{0, 3, 5, 6, 7, 8, 9};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access iterator type for input keys (may be a simple pointer type)
KeyIteratorT – [inferred] Random-access iterator type for output keys (may be a simple pointer type)
OffsetT – [inferred] Integer type for offsets
CompareOpT – [inferred] Comparison function object type
EnvT – [inferred] Execution environment type
- Parameters:
d_input_keys – [in] Pointer to the input sequence of unsorted input keys
d_output_keys – [out] Pointer to the output sequence of sorted input keys
num_items – [in] Number of items to sort
compare_op – [in] Comparison function object
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.