cub::DeviceTopK#

struct DeviceTopK#

@brief DeviceTopK provides device-wide, parallel operations for finding the largest (or smallest) K items from sequences of unordered data items residing within device-accessible memory.

@par Overview The TopK algorithm tries to find the largest (or smallest) K items in an unordered list. A related problem is called [K selection problem](https://en.wikipedia.org/wiki/Selection_algorithm), which finds the Kth largest (or smallest) values in a list. DeviceTopK will return K items in an unspecified order as results. It is based on an algorithm called [AIR TopK](https://dl.acm.org/doi/10.1145/3581784.3607062).

@par Supported Types DeviceTopK can process all of the built-in C++ numeric primitive types (unsigned char, int, double, etc.) as well as CUDA’s __half and __nv_bfloat16 16-bit floating-point types.

@par Stability DeviceTopK currently only provides an unstable version. Usage Considerations +++++++++++++++++++++++++++++++++++++++++++++

  • Dynamic parallelism. DeviceTopK methods can be called within kernel code on devices in which CUDA dynamic parallelism is supported.

Performance#

The work-complexity of top-k as a function of input size is linear, resulting in performance throughput that plateaus with problem sizes large enough to saturate the GPU.

Public Static Functions

template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename ValueInputIteratorT, typename ValueOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MaxPairs(
void *d_temp_storage,
size_t &temp_storage_bytes,
KeyInputIteratorT d_keys_in,
KeyOutputIteratorT d_keys_out,
ValueInputIteratorT d_values_in,
ValueOutputIteratorT d_values_out,
NumItemsT num_items,
NumOutItemsT k,
EnvT env = {},
)#

Overview#

Finds the largest K keys and their corresponding values from an unordered input sequence of key-value pairs.

  • When d_temp_storage is nullptr, no work is done and the required allocation size is returned in temp_storage_bytes.

A Simple Example#

The following code snippet demonstrates how to use the cub::DeviceTopK::MaxPairs function to find the largest K items:

const int k     = 4;
auto keys       = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6};
auto values     = cuda::make_counting_iterator<int>(0);
auto keys_out   = thrust::device_vector<int>(k, thrust::no_init);
auto values_out = thrust::device_vector<int>(k, thrust::no_init);

// Specify that we do not require a specific output order and do not require deterministic results
auto requirements =
  cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted);

// Query temporary storage requirements
size_t temp_storage_bytes{};
cub::DeviceTopK::MaxPairs(
  nullptr,
  temp_storage_bytes,
  keys.begin(),
  keys_out.begin(),
  values,
  values_out.begin(),
  keys.size(),
  k,
  requirements);

// Allocate temporary storage
thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init);

cub::DeviceTopK::MaxPairs(
  thrust::raw_pointer_cast(temp_storage.data()),
  temp_storage_bytes,
  keys.begin(),
  keys_out.begin(),
  values,
  values_out.begin(),
  keys.size(),
  k,
  requirements);

// Get the top-k results into sorted order for easy comparison
thrust::sort_by_key(keys_out.begin(), keys_out.end(), values_out.begin(), cuda::std::greater<>{});
thrust::host_vector<int> expected_keys{8, 7, 6, 5};
thrust::host_vector<int> expected_values{4, 3, 7, 0};

Template Parameters:
  • KeyInputIteratorT[inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)

  • KeyOutputIteratorT[inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)

  • ValueInputIteratorT[inferred] Random-access input iterator type for reading input values (may be a simple pointer type)

  • ValueOutputIteratorT[inferred] Random-access input iterator type for writing output values (may be a simple pointer type)

  • NumItemsT – The integral type of variable num_items

  • NumOutItemsT – The integral type of variable k

Parameters:
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_keys_in[in] Random-access iterator to the input sequence containing the keys

  • d_keys_out[out] Random-access iterator to the output sequence of keys, where K values will be written to

  • d_values_in[in] Random-access iterator to the input sequence containing the values associated to each key

  • d_values_out[out] Random-access iterator to the output sequence of values, corresponding to the top k keys, where k values will be written to

  • num_items[in] Number of items to be read and processed from d_keys_in and d_values_in each

  • k[in] The value of K, which is the number of largest pairs to find from num_items pairs. Capped to a maximum of num_items.

  • env[in]

    [optional] Execution environment. Default is cuda::std::execution::env{}.

template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename ValueInputIteratorT, typename ValueOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MinPairs(
void *d_temp_storage,
size_t &temp_storage_bytes,
KeyInputIteratorT d_keys_in,
KeyOutputIteratorT d_keys_out,
ValueInputIteratorT d_values_in,
ValueOutputIteratorT d_values_out,
NumItemsT num_items,
NumOutItemsT k,
EnvT env = {},
)#

Overview#

Finds the lowest K keys and their corresponding values from an unordered input sequence of key-value pairs.

  • When d_temp_storage is nullptr, no work is done and the required allocation size is returned in temp_storage_bytes.

A Simple Example#

The following code snippet demonstrates how to use the cub::DeviceTopK::MinPairs function to find the lowest K items:

const int k     = 4;
auto keys       = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6};
auto values     = cuda::make_counting_iterator<int>(0);
auto keys_out   = thrust::device_vector<int>(k, thrust::no_init);
auto values_out = thrust::device_vector<int>(k, thrust::no_init);

// Specify that we do not require a specific output order and do not require deterministic results
auto requirements =
  cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted);

// Query temporary storage requirements
size_t temp_storage_bytes{};
cub::DeviceTopK::MinPairs(
  nullptr,
  temp_storage_bytes,
  keys.begin(),
  keys_out.begin(),
  values,
  values_out.begin(),
  keys.size(),
  k,
  requirements);

// Allocate temporary storage
thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init);

cub::DeviceTopK::MinPairs(
  thrust::raw_pointer_cast(temp_storage.data()),
  temp_storage_bytes,
  keys.begin(),
  keys_out.begin(),
  values,
  values_out.begin(),
  keys.size(),
  k,
  requirements);

// Get the top-k results into sorted order for easy comparison
thrust::sort_by_key(keys_out.begin(), keys_out.end(), values_out.begin());
thrust::host_vector<int> expected_keys{-3, 1, 2, 4};
thrust::host_vector<int> expected_values{1, 2, 5, 6};

Template Parameters:
  • KeyInputIteratorT[inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)

  • KeyOutputIteratorT[inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)

  • ValueInputIteratorT[inferred] Random-access input iterator type for reading input values (may be a simple pointer type)

  • ValueOutputIteratorT[inferred] Random-access input iterator type for writing output values (may be a simple pointer type)

  • NumItemsT – The integral type of variable num_items

  • NumOutItemsT – The integral type of variable k

Parameters:
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_keys_in[in] Random-access iterator to the input sequence containing the keys

  • d_keys_out[out] Random-access iterator to the output sequence of keys, where K values will be written to

  • d_values_in[in] Random-access iterator to the input sequence containing the values associated to each key

  • d_values_out[out] Random-access iterator to the output sequence of values, corresponding to the top k keys, where k values will be written to

  • num_items[in] Number of items to be read and processed from d_keys_in and d_values_in each

  • k[in] The value of K, which is the number of lowest pairs to find from num_items pairs. Capped to a maximum of num_items.

  • env[in]

    [optional] Execution environment. Default is cuda::std::execution::env{}.

template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MaxKeys(
void *d_temp_storage,
size_t &temp_storage_bytes,
KeyInputIteratorT d_keys_in,
KeyOutputIteratorT d_keys_out,
NumItemsT num_items,
NumOutItemsT k,
EnvT env = {},
)#

Overview#

Finds the largest K keys from an unordered input sequence of keys.

  • When d_temp_storage is nullptr, no work is done and the required allocation size is returned in temp_storage_bytes.

A Simple Example#

The following code snippet demonstrates how to use the cub::DeviceTopK::MinKeys function to find the largest K items:

const int k = 4;
auto input  = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6};
auto output = thrust::device_vector<int>(k, thrust::no_init);

// Specify that we do not require a specific output order and do not require deterministic results
auto requirements =
  cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted);

// Query temporary storage requirements
size_t temp_storage_bytes{};
cub::DeviceTopK::MaxKeys(nullptr, temp_storage_bytes, input.begin(), output.begin(), input.size(), k, requirements);

// Allocate temporary storage
thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init);

cub::DeviceTopK::MaxKeys(
  thrust::raw_pointer_cast(temp_storage.data()),
  temp_storage_bytes,
  input.begin(),
  output.begin(),
  input.size(),
  k,
  requirements);

// Get the top-k results into sorted order for easy comparison
thrust::sort(output.begin(), output.end(), cuda::std::greater{});
thrust::host_vector<int> expected{8, 7, 6, 5};

Template Parameters:
  • KeyInputIteratorT[inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)

  • KeyOutputIteratorT[inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)

  • NumItemsT – The integral type of variable num_items

  • NumOutItemsT – The integral type of variable k

Parameters:
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_keys_in[in] Random-access iterator to the input sequence containing the keys

  • d_keys_out[out] Random-access iterator to the output sequence of keys, where K values will be written to

  • num_items[in] Number of items to be read and processed from d_keys_in

  • k[in] The value of K, which is the number of largest pairs to find from num_items pairs. Capped to a maximum of num_items.

  • env[in]

    [optional] Execution environment. Default is cuda::std::execution::env{}.

template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MinKeys(
void *d_temp_storage,
size_t &temp_storage_bytes,
KeyInputIteratorT d_keys_in,
KeyOutputIteratorT d_keys_out,
NumItemsT num_items,
NumOutItemsT k,
EnvT env = {},
)#

Overview#

Finds the lowest K keys from an unordered input sequence of keys.

  • When d_temp_storage is nullptr, no work is done and the required allocation size is returned in temp_storage_bytes.

A Simple Example#

The following code snippet demonstrates how to use the cub::DeviceTopK::MinKeys function to find the lowest K items:

const int k = 4;
auto input  = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6};
auto output = thrust::device_vector<int>(k, thrust::no_init);

// Specify that we do not require a specific output order and do not require deterministic results
auto requirements =
  cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted);

// Query temporary storage requirements
size_t temp_storage_bytes{};
cub::DeviceTopK::MinKeys(nullptr, temp_storage_bytes, input.begin(), output.begin(), input.size(), k, requirements);

// Allocate temporary storage
thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init);

cub::DeviceTopK::MinKeys(
  thrust::raw_pointer_cast(temp_storage.data()),
  temp_storage_bytes,
  input.begin(),
  output.begin(),
  input.size(),
  k,
  requirements);

// Get the top-k results into sorted order for easy comparison
thrust::sort(output.begin(), output.end());
thrust::host_vector<int> expected{-3, 1, 2, 4};

Template Parameters:
  • KeyInputIteratorT[inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)

  • KeyOutputIteratorT[inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)

  • NumItemsT – The integral type of variable num_items

  • NumOutItemsT – The integral type of variable k

Parameters:
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_keys_in[in] Random-access iterator to the input sequence containing the keys

  • d_keys_out[out] Random-access iterator to the output sequence of keys, where K values will be written to

  • num_items[in] Number of items to be read and processed from d_keys_in

  • k[in] The value of K, which is the number of largest pairs to find from num_items pairs. Capped to a maximum of num_items.

  • env[in]

    [optional] Execution environment. Default is cuda::std::execution::env{}.