cub::DeviceMerge#

struct DeviceMerge#

DeviceMerge provides device-wide, parallel operations for merging two sorted sequences of values (called keys) or key-value pairs in device-accessible memory.

The sorting order is determined by a comparison functor (default: less-than), which has to establish a strict weak ordering.

Public Static Functions

template<typename KeyIteratorIn1, typename KeyIteratorIn2, typename KeyIteratorOut, typename CompareOp = ::cuda::std::less<>>
static inline cudaError_t MergeKeys(
void *d_temp_storage,
size_t &temp_storage_bytes,
KeyIteratorIn1 keys_in1,
::cuda::std::int64_t num_keys1,
KeyIteratorIn2 keys_in2,
::cuda::std::int64_t num_keys2,
KeyIteratorOut keys_out,
CompareOp compare_op = {},
cudaStream_t stream = nullptr
)#

Overview#

Merges two sorted sequences of values (called keys) into a sorted output sequence. Merging is unstable, which means any two equivalent values (neither value is ordered before the other) may be written to the output sequence in any order.

Added in version 2.7.0: First appears in CUDA Toolkit 12.8.

A Simple Example#

The code snippet below illustrates the merging of two device vectors of int keys.

c2h::device_vector<int> keys1{0, 2, 5};
c2h::device_vector<int> keys2{0, 3, 3, 4};
c2h::device_vector<int> result(7);

// 1) Get temp storage size
std::size_t temp_storage_bytes = 0;
cub::DeviceMerge::MergeKeys(
  nullptr,
  temp_storage_bytes,
  keys1.begin(),
  static_cast<int>(keys1.size()),
  keys2.begin(),
  static_cast<int>(keys2.size()),
  result.begin());

// 2) Allocate temp storage
c2h::device_vector<char> temp_storage(temp_storage_bytes);

// 3) Perform merge operation
cub::DeviceMerge::MergeKeys(
  thrust::raw_pointer_cast(temp_storage.data()),
  temp_storage_bytes,
  keys1.begin(),
  static_cast<int>(keys1.size()),
  keys2.begin(),
  static_cast<int>(keys2.size()),
  result.begin());

c2h::host_vector<int> expected{0, 0, 2, 3, 3, 4, 5};

Template Parameters:
  • KeyIteratorIn1[deduced] Random access iterator to the first sorted input sequence. Must have the same value type as KeyIteratorIn2.

  • KeyIteratorIn2[deduced] Random access iterator to the second sorted input sequence. Must have the same value type as KeyIteratorIn1.

  • KeyIteratorOut[deduced] Random access iterator to the output sequence.

  • CompareOp[deduced] Binary predicate to compare the input iterator’s value types. Must have a signature equivalent to bool operator()(Key lhs, Key rhs) and establish a strict weak ordering.

Parameters:
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation.

  • keys_in1[in] Iterator to the beginning of the first sorted input sequence.

  • num_keys1[in] Number of keys in the first input sequence.

  • keys_in2[in] Iterator to the beginning of the second sorted input sequence.

  • num_keys2[in] Number of keys in the second input sequence.

  • keys_out[out] Iterator to the beginning of the output sequence.

  • compare_op[in] Comparison function object, returning true if the first argument is ordered before the second. Must establish a strict weak ordering.

  • stream[in] [optional] CUDA stream to launch kernels into. Default is stream0.

template<typename KeyIteratorIn1, typename ValueIteratorIn1, typename KeyIteratorIn2, typename ValueIteratorIn2, typename KeyIteratorOut, typename ValueIteratorOut, typename CompareOp = ::cuda::std::less<>>
static inline cudaError_t MergePairs(
void *d_temp_storage,
size_t &temp_storage_bytes,
KeyIteratorIn1 keys_in1,
ValueIteratorIn1 values_in1,
::cuda::std::int64_t num_pairs1,
KeyIteratorIn2 keys_in2,
ValueIteratorIn2 values_in2,
::cuda::std::int64_t num_pairs2,
KeyIteratorOut keys_out,
ValueIteratorOut values_out,
CompareOp compare_op = {},
cudaStream_t stream = nullptr
)#

Overview#

Merges two sorted sequences of key-value pairs into a sorted output sequence. Merging is unstable, which means any two equivalent values (neither value is ordered before the other) may be written to the output sequence in any order.

Added in version 2.7.0: First appears in CUDA Toolkit 12.8.

A Simple Example#

The code snippet below illustrates the merging of two device vectors of int keys.

c2h::device_vector<int> keys1{0, 2, 5};
c2h::device_vector<char> values1{'a', 'b', 'c'};
c2h::device_vector<int> keys2{0, 3, 3, 4};
c2h::device_vector<char> values2{'A', 'B', 'C', 'D'};
c2h::device_vector<int> result_keys(7);
c2h::device_vector<char> result_values(7);

// 1) Get temp storage size
std::size_t temp_storage_bytes = 0;
cub::DeviceMerge::MergePairs(
  nullptr,
  temp_storage_bytes,
  keys1.begin(),
  values1.begin(),
  static_cast<int>(keys1.size()),
  keys2.begin(),
  values2.begin(),
  static_cast<int>(keys2.size()),
  result_keys.begin(),
  result_values.begin());

// 2) Allocate temp storage
c2h::device_vector<char> temp_storage(temp_storage_bytes);

// 3) Perform merge operation
cub::DeviceMerge::MergePairs(
  thrust::raw_pointer_cast(temp_storage.data()),
  temp_storage_bytes,
  keys1.begin(),
  values1.begin(),
  static_cast<int>(keys1.size()),
  keys2.begin(),
  values2.begin(),
  static_cast<int>(keys2.size()),
  result_keys.begin(),
  result_values.begin());

c2h::host_vector<int> expected_keys{0, 0, 2, 3, 3, 4, 5};
c2h::host_vector<char> expected_values{'a', 'A', 'b', 'B', 'C', 'D', 'c'};

Template Parameters:
  • KeyIteratorIn1[deduced] Random access iterator to the keys of the first sorted input sequence. Must have the same value type as KeyIteratorIn2.

  • ValueIteratorIn1[deduced] Random access iterator to the values of the first sorted input sequence. Must have the same value type as ValueIteratorIn2.

  • KeyIteratorIn2[deduced] Random access iterator to the second sorted input sequence. Must have the same value type as KeyIteratorIn1.

  • ValueIteratorIn2[deduced] Random access iterator to the values of the second sorted input sequence. Must have the same value type as ValueIteratorIn1.

  • KeyIteratorOut[deduced] Random access iterator to the keys of the output sequence.

  • ValueIteratorOut[deduced] Random access iterator to the values of the output sequence.

  • CompareOp[deduced] Binary predicate to compare the key input iterator’s value types. Must have a signature equivalent to bool operator()(Key lhs, Key rhs) and establish a strict weak ordering.

Parameters:
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation.

  • keys_in1[in] Iterator to the beginning of the keys of the first sorted input sequence.

  • values_in1[in] Iterator to the beginning of the values of the first sorted input sequence.

  • num_pairs1[in] Number of key-value pairs in the first input sequence.

  • keys_in2[in] Iterator to the beginning of the keys of the second sorted input sequence.

  • values_in2[in] Iterator to the beginning of the values of the second sorted input sequence.

  • num_pairs2[in] Number of key-value pairs in the second input sequence.

  • keys_out[out] Iterator to the beginning of the keys of the output sequence.

  • values_out[out] Iterator to the beginning of the values of the output sequence.

  • compare_op[in] Comparison function object, returning true if the first argument is ordered before the second. Must establish a strict weak ordering.

  • stream[in] [optional] CUDA stream to launch kernels into. Default is stream0.