cub::DeviceTopK#
-
struct DeviceTopK#
DeviceTopK provides device-wide, parallel operations for finding the largest (or smallest) K items from sequences of unordered data items residing within device-accessible memory.
Overview#
The TopK algorithm tries to find the largest (or smallest) K items in an unordered list. A related problem is called K selection problem, which finds the Kth largest (or smallest) values in a list. DeviceTopK will return K items in an unspecified order as results. It is based on an algorithm called AIR TopK.
Supported Types#
DeviceTopK can process all of the built-in C++ numeric primitive types (unsigned char, int, double, etc.) as well as CUDA’s __half and __nv_bfloat16 16-bit floating-point types.
Determinism#
DeviceTopK currently only supports unordered output, which may be non-deterministic for certain inputs. That is, if there are multiple items across the k-th position that compare equal, the subset of tied elements that ends up in the returned top‑k is not uniquely defined and may vary between runs. This behavior has to be explicitly acknowledged by the user by passing cuda::execution::determinism::not_guaranteed.
Usage Considerations#
Dynamic parallelism. DeviceTopK methods can be called within kernel code on devices in which CUDA dynamic parallelism is supported.
Performance#
The work-complexity of top-k as a function of input size is linear, resulting in performance throughput that plateaus with problem sizes large enough to saturate the GPU.
Public Static Functions
-
template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename ValueInputIteratorT, typename ValueOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MaxPairs( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_keys_in,
- KeyOutputIteratorT d_keys_out,
- ValueInputIteratorT d_values_in,
- ValueOutputIteratorT d_values_out,
- NumItemsT num_items,
- NumOutItemsT k,
- EnvT env = {},
Overview#
Finds the largest K keys and their corresponding values from an unordered input sequence of key-value pairs.
When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
A Simple Example#
The following code snippet demonstrates how to use the cub::DeviceTopK::MaxPairs function to find the largest K items:
const int k = 4; auto keys = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6}; auto values = cuda::make_counting_iterator<int>(0); auto keys_out = thrust::device_vector<int>(k, thrust::no_init); auto values_out = thrust::device_vector<int>(k, thrust::no_init); // Specify that we do not require a specific output order and do not require deterministic results auto requirements = cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted); // Prepare CUDA stream cudaStream_t stream = nullptr; cudaStreamCreate(&stream); cuda::stream_ref stream_ref{stream}; // Create the environment with the stream and requirements auto env = cuda::std::execution::env{stream_ref, requirements}; // Query temporary storage requirements size_t temp_storage_bytes{}; cub::DeviceTopK::MaxPairs( nullptr, temp_storage_bytes, keys.begin(), keys_out.begin(), values, values_out.begin(), keys.size(), k, env); // Allocate temporary storage thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init); cub::DeviceTopK::MaxPairs( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, keys.begin(), keys_out.begin(), values, values_out.begin(), keys.size(), k, env); // Get the top-k results into sorted order for easy comparison thrust::sort_by_key(keys_out.begin(), keys_out.end(), values_out.begin(), cuda::std::greater<>{}); thrust::host_vector<int> expected_keys{8, 7, 6, 5}; thrust::host_vector<int> expected_values{4, 3, 7, 0};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)
KeyOutputIteratorT – [inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)
ValueInputIteratorT – [inferred] Random-access input iterator type for reading input values (may be a simple pointer type)
ValueOutputIteratorT – [inferred] Random-access input iterator type for writing output values (may be a simple pointer type)
NumItemsT – The integral type of variable num_items
NumOutItemsT – The integral type of variable k
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_keys_in – [in] Random-access iterator to the input sequence containing the keys
d_keys_out – [out] Random-access iterator to the output sequence of keys, where K values will be written to
d_values_in – [in] Random-access iterator to the input sequence containing the values associated to each key
d_values_out – [out] Random-access iterator to the output sequence of values, corresponding to the top k keys, where k values will be written to
num_items – [in] Number of items to be read and processed from
d_keys_inandd_values_ineachk – [in] The value of K, which is the number of largest pairs to find from
num_itemspairs. Capped to a maximum ofnum_items.env – [in]
[optional] Execution environment. Default is cuda::std::execution::env{}.
-
template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename ValueInputIteratorT, typename ValueOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MinPairs( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_keys_in,
- KeyOutputIteratorT d_keys_out,
- ValueInputIteratorT d_values_in,
- ValueOutputIteratorT d_values_out,
- NumItemsT num_items,
- NumOutItemsT k,
- EnvT env = {},
Overview#
Finds the lowest K keys and their corresponding values from an unordered input sequence of key-value pairs.
When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
A Simple Example#
The following code snippet demonstrates how to use the cub::DeviceTopK::MinPairs function to find the lowest K items:
const int k = 4; auto keys = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6}; auto values = cuda::make_counting_iterator<int>(0); auto keys_out = thrust::device_vector<int>(k, thrust::no_init); auto values_out = thrust::device_vector<int>(k, thrust::no_init); // Specify that we do not require a specific output order and do not require deterministic results auto requirements = cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted); // Prepare CUDA stream cudaStream_t stream = nullptr; cudaStreamCreate(&stream); cuda::stream_ref stream_ref{stream}; // Create the environment with the stream and requirements auto env = cuda::std::execution::env{stream_ref, requirements}; // Query temporary storage requirements size_t temp_storage_bytes{}; cub::DeviceTopK::MinPairs( nullptr, temp_storage_bytes, keys.begin(), keys_out.begin(), values, values_out.begin(), keys.size(), k, env); // Allocate temporary storage thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init); cub::DeviceTopK::MinPairs( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, keys.begin(), keys_out.begin(), values, values_out.begin(), keys.size(), k, env); // Get the top-k results into sorted order for easy comparison thrust::sort_by_key(keys_out.begin(), keys_out.end(), values_out.begin()); thrust::host_vector<int> expected_keys{-3, 1, 2, 4}; thrust::host_vector<int> expected_values{1, 2, 5, 6};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)
KeyOutputIteratorT – [inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)
ValueInputIteratorT – [inferred] Random-access input iterator type for reading input values (may be a simple pointer type)
ValueOutputIteratorT – [inferred] Random-access input iterator type for writing output values (may be a simple pointer type)
NumItemsT – The integral type of variable num_items
NumOutItemsT – The integral type of variable k
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_keys_in – [in] Random-access iterator to the input sequence containing the keys
d_keys_out – [out] Random-access iterator to the output sequence of keys, where K values will be written to
d_values_in – [in] Random-access iterator to the input sequence containing the values associated to each key
d_values_out – [out] Random-access iterator to the output sequence of values, corresponding to the top k keys, where k values will be written to
num_items – [in] Number of items to be read and processed from
d_keys_inandd_values_ineachk – [in] The value of K, which is the number of lowest pairs to find from
num_itemspairs. Capped to a maximum ofnum_items.env – [in]
[optional] Execution environment. Default is cuda::std::execution::env{}.
-
template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MaxKeys( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_keys_in,
- KeyOutputIteratorT d_keys_out,
- NumItemsT num_items,
- NumOutItemsT k,
- EnvT env = {},
Overview#
Finds the largest K keys from an unordered input sequence of keys.
When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
A Simple Example#
The following code snippet demonstrates how to use the cub::DeviceTopK::MinKeys function to find the largest K items:
const int k = 4; auto input = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6}; auto output = thrust::device_vector<int>(k, thrust::no_init); // Specify that we do not require a specific output order and do not require deterministic results auto requirements = cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted); // Prepare CUDA stream cudaStream_t stream = nullptr; cudaStreamCreate(&stream); cuda::stream_ref stream_ref{stream}; // Create the environment with the stream and requirements auto env = cuda::std::execution::env{stream_ref, requirements}; // Query temporary storage requirements size_t temp_storage_bytes{}; cub::DeviceTopK::MaxKeys(nullptr, temp_storage_bytes, input.begin(), output.begin(), input.size(), k, env); // Allocate temporary storage thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init); cub::DeviceTopK::MaxKeys( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, input.begin(), output.begin(), input.size(), k, env); // Get the top-k results into sorted order for easy comparison thrust::sort(output.begin(), output.end(), cuda::std::greater{}); thrust::host_vector<int> expected{8, 7, 6, 5};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)
KeyOutputIteratorT – [inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)
NumItemsT – The integral type of variable num_items
NumOutItemsT – The integral type of variable k
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_keys_in – [in] Random-access iterator to the input sequence containing the keys
d_keys_out – [out] Random-access iterator to the output sequence of keys, where K values will be written to
num_items – [in] Number of items to be read and processed from
d_keys_ink – [in] The value of K, which is the number of largest pairs to find from
num_itemspairs. Capped to a maximum ofnum_items.env – [in]
[optional] Execution environment. Default is cuda::std::execution::env{}.
-
template<typename KeyInputIteratorT, typename KeyOutputIteratorT, typename NumItemsT, typename NumOutItemsT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MinKeys( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- KeyInputIteratorT d_keys_in,
- KeyOutputIteratorT d_keys_out,
- NumItemsT num_items,
- NumOutItemsT k,
- EnvT env = {},
Overview#
Finds the lowest K keys from an unordered input sequence of keys.
When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
A Simple Example#
The following code snippet demonstrates how to use the cub::DeviceTopK::MinKeys function to find the lowest K items:
const int k = 4; auto input = thrust::device_vector<int>{5, -3, 1, 7, 8, 2, 4, 6}; auto output = thrust::device_vector<int>(k, thrust::no_init); // Specify that we do not require a specific output order and do not require deterministic results auto requirements = cuda::execution::require(cuda::execution::determinism::not_guaranteed, cuda::execution::output_ordering::unsorted); // Prepare CUDA stream cudaStream_t stream = nullptr; cudaStreamCreate(&stream); cuda::stream_ref stream_ref{stream}; // Create the environment with the stream and requirements auto env = cuda::std::execution::env{stream_ref, requirements}; // Query temporary storage requirements size_t temp_storage_bytes{}; cub::DeviceTopK::MinKeys(nullptr, temp_storage_bytes, input.begin(), output.begin(), input.size(), k, env); // Allocate temporary storage thrust::device_vector<char> temp_storage(temp_storage_bytes, thrust::no_init); cub::DeviceTopK::MinKeys( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, input.begin(), output.begin(), input.size(), k, env); // Get the top-k results into sorted order for easy comparison thrust::sort(output.begin(), output.end()); thrust::host_vector<int> expected{-3, 1, 2, 4};
- Template Parameters:
KeyInputIteratorT – [inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)
KeyOutputIteratorT – [inferred] Random-access output iterator type for writing output keys (may be a simple pointer type)
NumItemsT – The integral type of variable num_items
NumOutItemsT – The integral type of variable k
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_keys_in – [in] Random-access iterator to the input sequence containing the keys
d_keys_out – [out] Random-access iterator to the output sequence of keys, where K values will be written to
num_items – [in] Number of items to be read and processed from
d_keys_ink – [in] The value of K, which is the number of largest pairs to find from
num_itemspairs. Capped to a maximum ofnum_items.env – [in]
[optional] Execution environment. Default is cuda::std::execution::env{}.