cub::DeviceMerge
Defined in cub/device/device_merge.cuh
-
struct DeviceMerge
DeviceMerge provides device-wide, parallel operations for merging two sorted sequences of values (called keys) or key-value pairs in device-accessible memory. The sorting order is determined by a comparison functor (default: less-than), which has to establish a strict weak ordering.
Public Static Functions
-
template<typename KeyIteratorIn1, typename KeyIteratorIn2, typename KeyIteratorOut, typename CompareOp = ::cuda::std::less<>>
static inline cudaError_t MergeKeys(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorIn1 keys_in1, int num_keys1, KeyIteratorIn2 keys_in2, int num_keys2, KeyIteratorOut keys_out, CompareOp compare_op = {}, cudaStream_t stream = nullptr) Overview
Merges two sorted sequences of values (called keys) into a sorted output sequence. Merging is unstable, which means any two equivalent values (neither value is ordered before the other) may be written to the output sequence in any order.
A Simple Example
The code snippet below illustrates the merging of two device vectors of int keys.
thrust::device_vector<int> keys1{0, 2, 5}; thrust::device_vector<int> keys2{0, 3, 3, 4}; thrust::device_vector<int> result(7); // 1) Get temp storage size std::size_t temp_storage_bytes = 0; cub::DeviceMerge::MergeKeys( nullptr, temp_storage_bytes, keys1.begin(), static_cast<int>(keys1.size()), keys2.begin(), static_cast<int>(keys2.size()), result.begin()); // 2) Allocate temp storage thrust::device_vector<char> temp_storage(temp_storage_bytes); // 3) Perform merge operation cub::DeviceMerge::MergeKeys( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, keys1.begin(), static_cast<int>(keys1.size()), keys2.begin(), static_cast<int>(keys2.size()), result.begin()); thrust::host_vector<int> expected{0, 0, 2, 3, 3, 4, 5};
- Template Parameters
KeyIteratorIn1 – [deduced] Random access iterator to the first sorted input sequence. Must have the same value type as KeyIteratorIn2.
KeyIteratorIn2 – [deduced] Random access iterator to the second sorted input sequence. Must have the same value type as KeyIteratorIn1.
KeyIteratorOut – [deduced] Random access iterator to the output sequence.
CompareOp – [deduced] Binary predicate to compare the input iterator’s value types. Must have a signature equivalent to
bool operator()(Key lhs, Key rhs)
and establish a strict weak ordering.
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocation.keys_in1 – [in] Iterator to the beginning of the first sorted input sequence.
num_keys1 – [in] Number of keys in the first input sequence.
keys_in2 – [in] Iterator to the beginning of the second sorted input sequence.
num_keys2 – [in] Number of keys in the second input sequence.
keys_out – [out] Iterator to the beginning of the output sequence.
compare_op – [in] Comparison function object, returning true if the first argument is ordered before the second. Must establish a strict weak ordering.
stream – [in] [optional] CUDA stream to launch kernels into. Default is stream0.
-
template<typename KeyIteratorIn1, typename ValueIteratorIn1, typename KeyIteratorIn2, typename ValueIteratorIn2, typename KeyIteratorOut, typename ValueIteratorOut, typename CompareOp = ::cuda::std::less<>>
static inline cudaError_t MergePairs(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorIn1 keys_in1, ValueIteratorIn1 values_in1, int num_pairs1, KeyIteratorIn2 keys_in2, ValueIteratorIn2 values_in2, int num_pairs2, KeyIteratorOut keys_out, ValueIteratorOut values_out, CompareOp compare_op = {}, cudaStream_t stream = nullptr) Overview
Merges two sorted sequences of key-value pairs into a sorted output sequence. Merging is unstable, which means any two equivalent values (neither value is ordered before the other) may be written to the output sequence in any order.
A Simple Example
The code snippet below illustrates the merging of two device vectors of int keys.
thrust::device_vector<int> keys1{0, 2, 5}; thrust::device_vector<char> values1{'a', 'b', 'c'}; thrust::device_vector<int> keys2{0, 3, 3, 4}; thrust::device_vector<char> values2{'A', 'B', 'C', 'D'}; thrust::device_vector<int> result_keys(7); thrust::device_vector<char> result_values(7); // 1) Get temp storage size std::size_t temp_storage_bytes = 0; cub::DeviceMerge::MergePairs( nullptr, temp_storage_bytes, keys1.begin(), values1.begin(), static_cast<int>(keys1.size()), keys2.begin(), values2.begin(), static_cast<int>(keys2.size()), result_keys.begin(), result_values.begin()); // 2) Allocate temp storage thrust::device_vector<char> temp_storage(temp_storage_bytes); // 3) Perform merge operation cub::DeviceMerge::MergePairs( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, keys1.begin(), values1.begin(), static_cast<int>(keys1.size()), keys2.begin(), values2.begin(), static_cast<int>(keys2.size()), result_keys.begin(), result_values.begin()); thrust::host_vector<int> expected_keys{0, 0, 2, 3, 3, 4, 5}; thrust::host_vector<char> expected_values{'a', 'A', 'b', 'B', 'C', 'D', 'c'};
- Template Parameters
KeyIteratorIn1 – [deduced] Random access iterator to the keys of the first sorted input sequence. Must have the same value type as KeyIteratorIn2.
ValueIteratorIn1 – [deduced] Random access iterator to the values of the first sorted input sequence. Must have the same value type as ValueIteratorIn2.
KeyIteratorIn2 – [deduced] Random access iterator to the second sorted input sequence. Must have the same value type as KeyIteratorIn1.
ValueIteratorIn2 – [deduced] Random access iterator to the values of the second sorted input sequence. Must have the same value type as ValueIteratorIn1.
KeyIteratorOut – [deduced] Random access iterator to the keys of the output sequence.
ValueIteratorOut – [deduced] Random access iterator to the values of the output sequence.
CompareOp – [deduced] Binary predicate to compare the key input iterator’s value types. Must have a signature equivalent to
bool operator()(Key lhs, Key rhs)
and establish a strict weak ordering.
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocation.keys_in1 – [in] Iterator to the beginning of the keys of the first sorted input sequence.
values_in1 – [in] Iterator to the beginning of the values of the first sorted input sequence.
num_pairs1 – [in] Number of key-value pairs in the first input sequence.
keys_in2 – [in] Iterator to the beginning of the keys of the second sorted input sequence.
values_in2 – [in] Iterator to the beginning of the values of the second sorted input sequence.
num_pairs2 – [in] Number of key-value pairs in the second input sequence.
keys_out – [out] Iterator to the beginning of the keys of the output sequence.
values_out – [out] Iterator to the beginning of the values of the output sequence.
compare_op – [in] Comparison function object, returning true if the first argument is ordered before the second. Must establish a strict weak ordering.
stream – [in] [optional] CUDA stream to launch kernels into. Default is stream0.
-
template<typename KeyIteratorIn1, typename KeyIteratorIn2, typename KeyIteratorOut, typename CompareOp = ::cuda::std::less<>>