cub::DeviceMerge

Defined in cub/device/device_merge.cuh

struct DeviceMerge

DeviceMerge provides device-wide, parallel operations for merging two sorted sequences of values (called keys) or key-value pairs in device-accessible memory. The sorting order is determined by a comparison functor (default: less-than), which has to establish a strict weak ordering.

Public Static Functions

template<typename KeyIteratorIn1, typename KeyIteratorIn2, typename KeyIteratorOut, typename CompareOp = ::cuda::std::less<>>
static inline cudaError_t MergeKeys(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorIn1 keys_in1, int num_keys1, KeyIteratorIn2 keys_in2, int num_keys2, KeyIteratorOut keys_out, CompareOp compare_op = {}, cudaStream_t stream = nullptr)

Overview

Merges two sorted sequences of values (called keys) into a sorted output sequence. Merging is unstable, which means any two equivalent values (neither value is ordered before the other) may be written to the output sequence in any order.

A Simple Example

The code snippet below illustrates the merging of two device vectors of int keys.

thrust::device_vector<int> keys1{0, 2, 5};
thrust::device_vector<int> keys2{0, 3, 3, 4};
thrust::device_vector<int> result(7);

// 1) Get temp storage size
std::size_t temp_storage_bytes = 0;
cub::DeviceMerge::MergeKeys(
  nullptr,
  temp_storage_bytes,
  keys1.begin(),
  static_cast<int>(keys1.size()),
  keys2.begin(),
  static_cast<int>(keys2.size()),
  result.begin());

// 2) Allocate temp storage
thrust::device_vector<char> temp_storage(temp_storage_bytes);

// 3) Perform merge operation
cub::DeviceMerge::MergeKeys(
  thrust::raw_pointer_cast(temp_storage.data()),
  temp_storage_bytes,
  keys1.begin(),
  static_cast<int>(keys1.size()),
  keys2.begin(),
  static_cast<int>(keys2.size()),
  result.begin());

thrust::host_vector<int> expected{0, 0, 2, 3, 3, 4, 5};

Template Parameters
  • KeyIteratorIn1[deduced] Random access iterator to the first sorted input sequence. Must have the same value type as KeyIteratorIn2.

  • KeyIteratorIn2[deduced] Random access iterator to the second sorted input sequence. Must have the same value type as KeyIteratorIn1.

  • KeyIteratorOut[deduced] Random access iterator to the output sequence.

  • CompareOp[deduced] Binary predicate to compare the input iterator’s value types. Must have a signature equivalent to bool operator()(Key lhs, Key rhs) and establish a strict weak ordering.

Parameters
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation.

  • keys_in1[in] Iterator to the beginning of the first sorted input sequence.

  • num_keys1[in] Number of keys in the first input sequence.

  • keys_in2[in] Iterator to the beginning of the second sorted input sequence.

  • num_keys2[in] Number of keys in the second input sequence.

  • keys_out[out] Iterator to the beginning of the output sequence.

  • compare_op[in] Comparison function object, returning true if the first argument is ordered before the second. Must establish a strict weak ordering.

  • stream[in] [optional] CUDA stream to launch kernels into. Default is stream0.

template<typename KeyIteratorIn1, typename ValueIteratorIn1, typename KeyIteratorIn2, typename ValueIteratorIn2, typename KeyIteratorOut, typename ValueIteratorOut, typename CompareOp = ::cuda::std::less<>>
static inline cudaError_t MergePairs(void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorIn1 keys_in1, ValueIteratorIn1 values_in1, int num_pairs1, KeyIteratorIn2 keys_in2, ValueIteratorIn2 values_in2, int num_pairs2, KeyIteratorOut keys_out, ValueIteratorOut values_out, CompareOp compare_op = {}, cudaStream_t stream = nullptr)

Overview

Merges two sorted sequences of key-value pairs into a sorted output sequence. Merging is unstable, which means any two equivalent values (neither value is ordered before the other) may be written to the output sequence in any order.

A Simple Example

The code snippet below illustrates the merging of two device vectors of int keys.

thrust::device_vector<int> keys1{0, 2, 5};
thrust::device_vector<char> values1{'a', 'b', 'c'};
thrust::device_vector<int> keys2{0, 3, 3, 4};
thrust::device_vector<char> values2{'A', 'B', 'C', 'D'};
thrust::device_vector<int> result_keys(7);
thrust::device_vector<char> result_values(7);

// 1) Get temp storage size
std::size_t temp_storage_bytes = 0;
cub::DeviceMerge::MergePairs(
  nullptr,
  temp_storage_bytes,
  keys1.begin(),
  values1.begin(),
  static_cast<int>(keys1.size()),
  keys2.begin(),
  values2.begin(),
  static_cast<int>(keys2.size()),
  result_keys.begin(),
  result_values.begin());

// 2) Allocate temp storage
thrust::device_vector<char> temp_storage(temp_storage_bytes);

// 3) Perform merge operation
cub::DeviceMerge::MergePairs(
  thrust::raw_pointer_cast(temp_storage.data()),
  temp_storage_bytes,
  keys1.begin(),
  values1.begin(),
  static_cast<int>(keys1.size()),
  keys2.begin(),
  values2.begin(),
  static_cast<int>(keys2.size()),
  result_keys.begin(),
  result_values.begin());

thrust::host_vector<int> expected_keys{0, 0, 2, 3, 3, 4, 5};
thrust::host_vector<char> expected_values{'a', 'A', 'b', 'B', 'C', 'D', 'c'};

Template Parameters
  • KeyIteratorIn1[deduced] Random access iterator to the keys of the first sorted input sequence. Must have the same value type as KeyIteratorIn2.

  • ValueIteratorIn1[deduced] Random access iterator to the values of the first sorted input sequence. Must have the same value type as ValueIteratorIn2.

  • KeyIteratorIn2[deduced] Random access iterator to the second sorted input sequence. Must have the same value type as KeyIteratorIn1.

  • ValueIteratorIn2[deduced] Random access iterator to the values of the second sorted input sequence. Must have the same value type as ValueIteratorIn1.

  • KeyIteratorOut[deduced] Random access iterator to the keys of the output sequence.

  • ValueIteratorOut[deduced] Random access iterator to the values of the output sequence.

  • CompareOp[deduced] Binary predicate to compare the key input iterator’s value types. Must have a signature equivalent to bool operator()(Key lhs, Key rhs) and establish a strict weak ordering.

Parameters
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation.

  • keys_in1[in] Iterator to the beginning of the keys of the first sorted input sequence.

  • values_in1[in] Iterator to the beginning of the values of the first sorted input sequence.

  • num_pairs1[in] Number of key-value pairs in the first input sequence.

  • keys_in2[in] Iterator to the beginning of the keys of the second sorted input sequence.

  • values_in2[in] Iterator to the beginning of the values of the second sorted input sequence.

  • num_pairs2[in] Number of key-value pairs in the second input sequence.

  • keys_out[out] Iterator to the beginning of the keys of the output sequence.

  • values_out[out] Iterator to the beginning of the values of the output sequence.

  • compare_op[in] Comparison function object, returning true if the first argument is ordered before the second. Must establish a strict weak ordering.

  • stream[in] [optional] CUDA stream to launch kernels into. Default is stream0.