cub::DeviceTopK#
-
struct DeviceTopK#
@brief DeviceTopK provides device-wide, parallel operations for finding the largest (or smallest) K items from sequences of unordered data items residing within device-accessible memory.
@par Overview The TopK algorithm tries to find the largest (or smallest) K items in an unordered list. A related problem is called [K selection problem](https://en.wikipedia.org/wiki/Selection_algorithm), which finds the Kth largest (or smallest) values in a list. DeviceTopK will return K items in an unspecified order as results. It is based on an algorithm called [AIR TopK](https://dl.acm.org/doi/10.1145/3581784.3607062).
@par Supported Types DeviceTopK can process all of the built-in C++ numeric primitive types (unsigned char, int, double, etc.) as well as CUDA’s __half and __nv_bfloat16 16-bit floating-point types.
@par Stability DeviceTopK currently only provides an unstable version. Usage Considerations +++++++++++++++++++++++++++++++++++++++++++++
Dynamic parallelism. DeviceTopK methods can be called within kernel code on devices in which CUDA dynamic parallelism is supported.
Performance#
The work-complexity of top-k as a function of input size is linear, resulting in performance throughput that plateaus with problem sizes large enough to saturate the GPU.
Public Static Functions
-
template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename ValueInputIteratorT, typename ValueOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MaxPairs( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_keys_in,
- KeyOutputIteratorT d_keys_out,
- ValueInputIteratorT d_values_in,
- ValueOutputIteratorT d_values_out,
- NumItemsT num_items,
- NumOutItemsT k,
- EnvT env = {},
Overview#
Finds the largest K keys and their corresponding values from an unordered input sequence of key-value pairs.
When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
A Simple Example#
The following code snippet demonstrates how to use the cub::DeviceTopK::MaxPairs function to find the largest K items:
const int k = 4; auto keys = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6}; auto values = cuda::make_counting_iterator<int>(0); auto keys_out = thrust::device_vector<int>(k, thrust::no_init); auto values_out = thrust::device_vector<int>(k, thrust::no_init); // Specify that we do not require a specific output order and do not require deterministic results auto requirements = cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted); // Query temporary storage requirements size_t temp_storage_bytes{}; cub::DeviceTopK::MaxPairs( nullptr, temp_storage_bytes, keys.begin(), keys_out.begin(), values, values_out.begin(), keys.size(), k, requirements); // Allocate temporary storage thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init); cub::DeviceTopK::MaxPairs( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, keys.begin(), keys_out.begin(), values, values_out.begin(), keys.size(), k, requirements); // Get the top-k results into sorted order for easy comparison thrust::sort_by_key(keys_out.begin(), keys_out.end(), values_out.begin(), cuda::std::greater<>{}); thrust::host_vector<int> expected_keys{8, 7, 6, 5}; thrust::host_vector<int> expected_values{4, 3, 7, 0};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)
KeyOutputIteratorT – [inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)
ValueInputIteratorT – [inferred] Random-access input iterator type for reading input values (may be a simple pointer type)
ValueOutputIteratorT – [inferred] Random-access input iterator type for writing output values (may be a simple pointer type)
NumItemsT – The integral type of variable num_items
NumOutItemsT – The integral type of variable k
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_keys_in – [in] Random-access iterator to the input sequence containing the keys
d_keys_out – [out] Random-access iterator to the output sequence of keys, where K values will be written to
d_values_in – [in] Random-access iterator to the input sequence containing the values associated to each key
d_values_out – [out] Random-access iterator to the output sequence of values, corresponding to the top k keys, where k values will be written to
num_items – [in] Number of items to be read and processed from
d_keys_in
andd_values_in
eachk – [in] The value of K, which is the number of largest pairs to find from
num_items
pairs. Capped to a maximum ofnum_items
.env – [in]
[optional] Execution environment. Default is cuda::std::execution::env{}.
-
template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename ValueInputIteratorT, typename ValueOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MinPairs( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_keys_in,
- KeyOutputIteratorT d_keys_out,
- ValueInputIteratorT d_values_in,
- ValueOutputIteratorT d_values_out,
- NumItemsT num_items,
- NumOutItemsT k,
- EnvT env = {},
Overview#
Finds the lowest K keys and their corresponding values from an unordered input sequence of key-value pairs.
When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
A Simple Example#
The following code snippet demonstrates how to use the cub::DeviceTopK::MinPairs function to find the lowest K items:
const int k = 4; auto keys = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6}; auto values = cuda::make_counting_iterator<int>(0); auto keys_out = thrust::device_vector<int>(k, thrust::no_init); auto values_out = thrust::device_vector<int>(k, thrust::no_init); // Specify that we do not require a specific output order and do not require deterministic results auto requirements = cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted); // Query temporary storage requirements size_t temp_storage_bytes{}; cub::DeviceTopK::MinPairs( nullptr, temp_storage_bytes, keys.begin(), keys_out.begin(), values, values_out.begin(), keys.size(), k, requirements); // Allocate temporary storage thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init); cub::DeviceTopK::MinPairs( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, keys.begin(), keys_out.begin(), values, values_out.begin(), keys.size(), k, requirements); // Get the top-k results into sorted order for easy comparison thrust::sort_by_key(keys_out.begin(), keys_out.end(), values_out.begin()); thrust::host_vector<int> expected_keys{-3, 1, 2, 4}; thrust::host_vector<int> expected_values{1, 2, 5, 6};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)
KeyOutputIteratorT – [inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)
ValueInputIteratorT – [inferred] Random-access input iterator type for reading input values (may be a simple pointer type)
ValueOutputIteratorT – [inferred] Random-access input iterator type for writing output values (may be a simple pointer type)
NumItemsT – The integral type of variable num_items
NumOutItemsT – The integral type of variable k
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_keys_in – [in] Random-access iterator to the input sequence containing the keys
d_keys_out – [out] Random-access iterator to the output sequence of keys, where K values will be written to
d_values_in – [in] Random-access iterator to the input sequence containing the values associated to each key
d_values_out – [out] Random-access iterator to the output sequence of values, corresponding to the top k keys, where k values will be written to
num_items – [in] Number of items to be read and processed from
d_keys_in
andd_values_in
eachk – [in] The value of K, which is the number of lowest pairs to find from
num_items
pairs. Capped to a maximum ofnum_items
.env – [in]
[optional] Execution environment. Default is cuda::std::execution::env{}.
-
template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MaxKeys( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_keys_in,
- KeyOutputIteratorT d_keys_out,
- NumItemsT num_items,
- NumOutItemsT k,
- EnvT env = {},
Overview#
Finds the largest K keys from an unordered input sequence of keys.
When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
A Simple Example#
The following code snippet demonstrates how to use the cub::DeviceTopK::MinKeys function to find the largest K items:
const int k = 4; auto input = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6}; auto output = thrust::device_vector<int>(k, thrust::no_init); // Specify that we do not require a specific output order and do not require deterministic results auto requirements = cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted); // Query temporary storage requirements size_t temp_storage_bytes{}; cub::DeviceTopK::MaxKeys(nullptr, temp_storage_bytes, input.begin(), output.begin(), input.size(), k, requirements); // Allocate temporary storage thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init); cub::DeviceTopK::MaxKeys( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, input.begin(), output.begin(), input.size(), k, requirements); // Get the top-k results into sorted order for easy comparison thrust::sort(output.begin(), output.end(), cuda::std::greater{}); thrust::host_vector<int> expected{8, 7, 6, 5};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)
KeyOutputIteratorT – [inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)
NumItemsT – The integral type of variable num_items
NumOutItemsT – The integral type of variable k
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_keys_in – [in] Random-access iterator to the input sequence containing the keys
d_keys_out – [out] Random-access iterator to the output sequence of keys, where K values will be written to
num_items – [in] Number of items to be read and processed from
d_keys_in
k – [in] The value of K, which is the number of largest pairs to find from
num_items
pairs. Capped to a maximum ofnum_items
.env – [in]
[optional] Execution environment. Default is cuda::std::execution::env{}.
-
template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MinKeys( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_keys_in,
- KeyOutputIteratorT d_keys_out,
- NumItemsT num_items,
- NumOutItemsT k,
- EnvT env = {},
Overview#
Finds the lowest K keys from an unordered input sequence of keys.
When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
A Simple Example#
The following code snippet demonstrates how to use the cub::DeviceTopK::MinKeys function to find the lowest K items:
const int k = 4; auto input = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6}; auto output = thrust::device_vector<int>(k, thrust::no_init); // Specify that we do not require a specific output order and do not require deterministic results auto requirements = cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted); // Query temporary storage requirements size_t temp_storage_bytes{}; cub::DeviceTopK::MinKeys(nullptr, temp_storage_bytes, input.begin(), output.begin(), input.size(), k, requirements); // Allocate temporary storage thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init); cub::DeviceTopK::MinKeys( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, input.begin(), output.begin(), input.size(), k, requirements); // Get the top-k results into sorted order for easy comparison thrust::sort(output.begin(), output.end()); thrust::host_vector<int> expected{-3, 1, 2, 4};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)
KeyOutputIteratorT – [inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)
NumItemsT – The integral type of variable num_items
NumOutItemsT – The integral type of variable k
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_keys_in – [in] Random-access iterator to the input sequence containing the keys
d_keys_out – [out] Random-access iterator to the output sequence of keys, where K values will be written to
num_items – [in] Number of items to be read and processed from
d_keys_in
k – [in] The value of K, which is the number of largest pairs to find from
num_items
pairs. Capped to a maximum ofnum_items
.env – [in]
[optional] Execution environment. Default is cuda::std::execution::env{}.