cub::DeviceSelect
Defined in cub/device/device_select.cuh
-
struct DeviceSelect
DeviceSelect provides device-wide, parallel operations for compacting selected items from sequences of data items residing within device-accessible memory.
Overview
These operations apply a selection criterion to selectively copy items from a specified input sequence to a compact output sequence.
Usage Considerations
Dynamic parallelism. DeviceSelect methods can be called within kernel code on devices in which CUDA dynamic parallelism is supported.
Performance
The work-complexity of select-flagged, select-if, and select-unique as a function of input size is linear, resulting in performance throughput that plateaus with problem sizes large enough to saturate the GPU.
Public Static Functions
-
template<typename InputIteratorT, typename FlagIterator, typename OutputIteratorT, typename NumSelectedIteratorT>
static inline cudaError_t Flagged(void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, FlagIterator d_flags, OutputIteratorT d_out, NumSelectedIteratorT d_num_selected_out, int num_items, cudaStream_t stream = 0) Uses the
d_flags
sequence to selectively copy the corresponding items fromd_in
intod_out
. The total number of items selected is written tod_num_selected_out
.The value type of
d_flags
must be castable tobool
(e.g.,bool
,char
,int
, etc.).Copies of the selected items are compacted into
d_out
and maintain their original relative ordering.- The range
[d_out, d_out + *d_num_selected_out)
shall not overlap[d_in, d_in + num_items)
,[d_flags, d_flags + num_items)
nord_num_selected_out
in any way. When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the compaction of items selected from an
int
device vector.#include <cub/cub.cuh> // or equivalently <cub/device/device_select.cuh> // Declare, allocate, and initialize device-accessible pointers for input, // flags, and output int num_items; // e.g., 8 int *d_in; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] char *d_flags; // e.g., [1, 0, 0, 1, 0, 1, 1, 0] int *d_out; // e.g., [ , , , , , , , ] int *d_num_selected_out; // e.g., [ ] ... // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceSelect::Flagged( d_temp_storage, temp_storage_bytes, d_in, d_flags, d_out, d_num_selected_out, num_items); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run selection cub::DeviceSelect::Flagged( d_temp_storage, temp_storage_bytes, d_in, d_flags, d_out, d_num_selected_out, num_items); // d_out <-- [1, 4, 6, 7] // d_num_selected_out <-- [4]
- Template Parameters
InputIteratorT – [inferred] Random-access input iterator type for reading input items (may be a simple pointer type)
FlagIterator – [inferred] Random-access input iterator type for reading selection flags (may be a simple pointer type)
OutputIteratorT – [inferred] Random-access output iterator type for writing selected items (may be a simple pointer type)
NumSelectedIteratorT – [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_in – [in] Pointer to the input sequence of data items
d_flags – [in] Pointer to the input sequence of selection flags
d_out – [out] Pointer to the output sequence of selected data items
d_num_selected_out – [out] Pointer to the output total number of items selected (i.e., length of
d_out
)num_items – [in] Total number of input items (i.e., length of
d_in
)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename IteratorT, typename FlagIterator, typename NumSelectedIteratorT>
static inline cudaError_t Flagged(void *d_temp_storage, size_t &temp_storage_bytes, IteratorT d_data, FlagIterator d_flags, NumSelectedIteratorT d_num_selected_out, int num_items, cudaStream_t stream = 0) Uses the
d_flags
sequence to selectively compact the items in d_data`. The total number of items selected is written tod_num_selected_out
.The value type of
d_flags
must be castable tobool
(e.g.,bool
,char
,int
, etc.).Copies of the selected items are compacted in-place and maintain their original relative ordering.
- The
d_data
may equald_flags
. The range[d_data, d_data + num_items)
shall not overlap[d_flags, d_flags + num_items)
in any other way. When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the compaction of items selected from an
int
device vector.#include <cub/cub.cuh> // or equivalently <cub/device/device_select.cuh> // Declare, allocate, and initialize device-accessible pointers for input, // flags, and output int num_items; // e.g., 8 int *d_data; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] char *d_flags; // e.g., [1, 0, 0, 1, 0, 1, 1, 0] int *d_num_selected_out; // e.g., [ ] ... // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceSelect::Flagged( d_temp_storage, temp_storage_bytes, d_in, d_flags, d_num_selected_out, num_items); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run selection cub::DeviceSelect::Flagged( d_temp_storage, temp_storage_bytes, d_in, d_flags, d_num_selected_out, num_items); // d_data <-- [1, 4, 6, 7] // d_num_selected_out <-- [4]
- Template Parameters
IteratorT – [inferred] Random-access iterator type for reading and writing selected items (may be a simple pointer type)
FlagIterator – [inferred] Random-access input iterator type for reading selection flags (may be a simple pointer type)
NumSelectedIteratorT – [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_data – [inout] Pointer to the sequence of data items
d_flags – [in] Pointer to the input sequence of selection flags
d_num_selected_out – [out] Pointer to the output total number of items selected
num_items – [in] Total number of input items (i.e., length of
d_data
)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename InputIteratorT, typename OutputIteratorT, typename NumSelectedIteratorT, typename SelectOp>
static inline cudaError_t If(void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, NumSelectedIteratorT d_num_selected_out, int num_items, SelectOp select_op, cudaStream_t stream = 0) Uses the
select_op
functor to selectively copy items fromd_in
intod_out
. The total number of items selected is written tod_num_selected_out
.Copies of the selected items are compacted into
d_out
and maintain their original relative ordering.- The range
[d_out, d_out + *d_num_selected_out)
shall not overlap[d_in, d_in + num_items)
nord_num_selected_out
in any way. When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the compaction of items selected from an
int
device vector.#include <cub/cub.cuh> // or equivalently <cub/device/device_select.cuh> // Functor type for selecting values less than some criteria struct LessThan { int compare; __host__ __device__ __forceinline__ LessThan(int compare) : compare(compare) {} __host__ __device__ __forceinline__ bool operator()(const int &a) const { return (a < compare); } }; // Declare, allocate, and initialize device-accessible pointers // for input and output int num_items; // e.g., 8 int *d_in; // e.g., [0, 2, 3, 9, 5, 2, 81, 8] int *d_out; // e.g., [ , , , , , , , ] int *d_num_selected_out; // e.g., [ ] LessThan select_op(7); ... // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceSelect::If( d_temp_storage, temp_storage_bytes, d_in, d_out, d_num_selected_out, num_items, select_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run selection cub::DeviceSelect::If( d_temp_storage, temp_storage_bytes, d_in, d_out, d_num_selected_out, num_items, select_op); // d_out <-- [0, 2, 3, 5, 2] // d_num_selected_out <-- [5]
- Template Parameters
InputIteratorT – [inferred] Random-access input iterator type for reading input items (may be a simple pointer type)
OutputIteratorT – [inferred] Random-access output iterator type for writing selected items (may be a simple pointer type)
NumSelectedIteratorT – [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type)
SelectOp – [inferred] Selection operator type having member
bool operator()(const T &a)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_in – [in] Pointer to the input sequence of data items
d_out – [out] Pointer to the output sequence of selected data items
d_num_selected_out – [out] Pointer to the output total number of items selected (i.e., length of
d_out
)num_items – [in] Total number of input items (i.e., length of
d_in
)select_op – [in] Unary selection operator
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename IteratorT, typename NumSelectedIteratorT, typename SelectOp>
static inline cudaError_t If(void *d_temp_storage, size_t &temp_storage_bytes, IteratorT d_data, NumSelectedIteratorT d_num_selected_out, int num_items, SelectOp select_op, cudaStream_t stream = 0) Uses the
select_op
functor to selectively compact items ind_data
. The total number of items selected is written tod_num_selected_out
.- Copies of the selected items are compacted in
d_data
and maintaintheir original relative ordering. When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the compaction of items selected from an
int
device vector.#include <cub/cub.cuh> // or equivalently <cub/device/device_select.cuh> // Functor type for selecting values less than some criteria struct LessThan { int compare; __host__ __device__ __forceinline__ LessThan(int compare) : compare(compare) {} __host__ __device__ __forceinline__ bool operator()(const int &a) const { return (a < compare); } }; // Declare, allocate, and initialize device-accessible pointers // for input and output int num_items; // e.g., 8 int *d_data; // e.g., [0, 2, 3, 9, 5, 2, 81, 8] int *d_num_selected_out; // e.g., [ ] LessThan select_op(7); ... // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceSelect::If( d_temp_storage, temp_storage_bytes, d_data, d_num_selected_out, num_items, select_op); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run selection cub::DeviceSelect::If( d_temp_storage, temp_storage_bytes, d_data, d_num_selected_out, num_items, select_op); // d_data <-- [0, 2, 3, 5, 2] // d_num_selected_out <-- [5]
- Template Parameters
IteratorT – [inferred] Random-access input iterator type for reading and writing items (may be a simple pointer type)
NumSelectedIteratorT – [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type)
SelectOp – [inferred] Selection operator type having member
bool operator()(const T &a)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_data – [inout] Pointer to the sequence of data items
d_num_selected_out – [out] Pointer to the output total number of items selected
num_items – [in] Total number of input items (i.e., length of
d_data
)select_op – [in] Unary selection operator
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename InputIteratorT, typename FlagIterator, typename OutputIteratorT, typename NumSelectedIteratorT, typename SelectOp>
static inline cudaError_t FlaggedIf(void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, FlagIterator d_flags, OutputIteratorT d_out, NumSelectedIteratorT d_num_selected_out, int num_items, SelectOp select_op, cudaStream_t stream = 0) Uses the
select_op
functor applied tod_flags
to selectively copy the corresponding items fromd_in
intod_out
. The total number of items selected is written tod_num_selected_out
.The expression
select_op(flag)
must be convertible tobool
, where the type offlag
corresponds to the value type ofFlagIterator
.Copies of the selected items are compacted into
d_out
and maintain their original relative ordering.- The range
[d_out, d_out + *d_num_selected_out)
shall not overlap[d_in, d_in + num_items)
nord_num_selected_out
in any way. When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the compaction of items selected from an
int
device vector.struct is_even_t { __host__ __device__ bool operator()(int flag) const { return !(flag % 2); } };
constexpr int num_items = 8; thrust::device_vector<int> d_in = {0, 1, 2, 3, 4, 5, 6, 7}; thrust::device_vector<int> d_flags = {8, 6, 7, 5, 3, 0, 9, 3}; thrust::device_vector<int> d_out(num_items); thrust::device_vector<int> d_num_selected_out(num_items); is_even_t is_even{}; // Determine temporary device storage requirements size_t temp_storage_bytes = 0; cub::DeviceSelect::FlaggedIf( nullptr, temp_storage_bytes, d_in.begin(), d_flags.begin(), d_out.begin(), d_num_selected_out.data(), num_items, is_even); // Allocate temporary storage c2h::device_vector<char> temp_storage(temp_storage_bytes); // Run selection cub::DeviceSelect::FlaggedIf( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, d_in.begin(), d_flags.begin(), d_out.begin(), d_num_selected_out.data(), num_items, is_even); thrust::device_vector<int> expected{0, 1, 5};
- Template Parameters
InputIteratorT – [inferred] Random-access input iterator type for reading input items (may be a simple pointer type)
FlagIterator – [inferred] Random-access input iterator type for reading selection flags (may be a simple pointer type)
OutputIteratorT – [inferred] Random-access output iterator type for writing selected items (may be a simple pointer type)
NumSelectedIteratorT – [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type)
SelectOp – [inferred] Selection operator type having member
bool operator()(const T &a)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_in – [in] Pointer to the input sequence of data items
d_flags – [in] Pointer to the input sequence of selection flags
d_out – [out] Pointer to the output sequence of selected data items
d_num_selected_out – [out] Pointer to the output total number of items selected (i.e., length of
d_out
)num_items – [in] Total number of input items (i.e., length of
d_in
)select_op – [in] Unary selection operator
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename IteratorT, typename FlagIterator, typename NumSelectedIteratorT, typename SelectOp>
static inline cudaError_t FlaggedIf(void *d_temp_storage, size_t &temp_storage_bytes, IteratorT d_data, FlagIterator d_flags, NumSelectedIteratorT d_num_selected_out, int num_items, SelectOp select_op, cudaStream_t stream = 0) Uses the
select_op
functor applied tod_flags
to selectively compact the corresponding items ind_data
. The total number of items selected is written tod_num_selected_out
.The expression
select_op(flag)
must be convertible tobool
, where the type offlag
corresponds to the value type ofFlagIterator
.Copies of the selected items are compacted in-place and maintain their original relative ordering.
- The
d_data
may equald_flags
. The range[d_data, d_data + num_items)
shall not overlap[d_flags, d_flags + num_items)
in any other way. When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the compaction of items selected from an
int
device vector.struct is_even_t { __host__ __device__ bool operator()(int flag) const { return !(flag % 2); } };
constexpr int num_items = 8; thrust::device_vector<int> d_data = {0, 1, 2, 3, 4, 5, 6, 7}; thrust::device_vector<int> d_flags = {8, 6, 7, 5, 3, 0, 9, 3}; thrust::device_vector<int> d_num_selected_out(num_items); is_even_t is_even{}; // Determine temporary device storage requirements size_t temp_storage_bytes = 0; cub::DeviceSelect::FlaggedIf( nullptr, temp_storage_bytes, d_data.begin(), d_flags.begin(), d_num_selected_out.data(), num_items, is_even); // Allocate temporary storage c2h::device_vector<char> temp_storage(temp_storage_bytes); // Run selection cub::DeviceSelect::FlaggedIf( thrust::raw_pointer_cast(temp_storage.data()), temp_storage_bytes, d_data.begin(), d_flags.begin(), d_num_selected_out.data(), num_items, is_even); thrust::device_vector<int> expected{0, 1, 5};
- Template Parameters
IteratorT – [inferred] Random-access iterator type for reading and writing selected items (may be a simple pointer type)
FlagIterator – [inferred] Random-access input iterator type for reading selection flags (may be a simple pointer type)
NumSelectedIteratorT – [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type)
SelectOp – [inferred] Selection operator type having member
bool operator()(const T &a)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_data – [inout] Pointer to the sequence of data items
d_flags – [in] Pointer to the input sequence of selection flags
d_num_selected_out – [out] Pointer to the output total number of items selected
num_items – [in] Total number of input items (i.e., length of
d_data
)select_op – [in] Unary selection operator
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename InputIteratorT, typename OutputIteratorT, typename NumSelectedIteratorT>
static inline cudaError_t Unique(void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, NumSelectedIteratorT d_num_selected_out, int num_items, cudaStream_t stream = 0) Given an input sequence
d_in
having runs of consecutive equal-valued keys, only the first key from each run is selectively copied tod_out
. The total number of items selected is written tod_num_selected_out
.The
==
equality operator is used to determine whether keys are equivalentCopies of the selected items are compacted into
d_out
and maintain their original relative ordering.- The range
[d_out, d_out + *d_num_selected_out)
shall not overlap[d_in, d_in + num_items)
nord_num_selected_out
in any way. When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the compaction of items selected from an
int
device vector.#include <cub/cub.cuh> // or equivalently <cub/device/device_select.cuh> // Declare, allocate, and initialize device-accessible pointers // for input and output int num_items; // e.g., 8 int *d_in; // e.g., [0, 2, 2, 9, 5, 5, 5, 8] int *d_out; // e.g., [ , , , , , , , ] int *d_num_selected_out; // e.g., [ ] ... // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceSelect::Unique( d_temp_storage, temp_storage_bytes, d_in, d_out, d_num_selected_out, num_items); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run selection cub::DeviceSelect::Unique( d_temp_storage, temp_storage_bytes, d_in, d_out, d_num_selected_out, num_items); // d_out <-- [0, 2, 9, 5, 8] // d_num_selected_out <-- [5]
- Template Parameters
InputIteratorT – [inferred] Random-access input iterator type for reading input items (may be a simple pointer type)
OutputIteratorT – [inferred] Random-access output iterator type for writing selected items (may be a simple pointer type)
NumSelectedIteratorT – [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_in – [in] Pointer to the input sequence of data items
d_out – [out] Pointer to the output sequence of selected data items
d_num_selected_out – [out] Pointer to the output total number of items selected (i.e., length of
d_out
)num_items – [in] Total number of input items (i.e., length of
d_in
)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyInputIteratorT, typename ValueInputIteratorT, typename KeyOutputIteratorT, typename ValueOutputIteratorT, typename NumSelectedIteratorT, typename NumItemsT, typename EqualityOpT>
static inline typename ::cuda::std::enable_if<!::cuda::std::is_convertible<EqualityOpT, cudaStream_t>::value, cudaError_t>::type UniqueByKey(void *d_temp_storage, size_t &temp_storage_bytes, KeyInputIteratorT d_keys_in, ValueInputIteratorT d_values_in, KeyOutputIteratorT d_keys_out, ValueOutputIteratorT d_values_out, NumSelectedIteratorT d_num_selected_out, NumItemsT num_items, EqualityOpT equality_op, cudaStream_t stream = 0) Given an input sequence
d_keys_in
andd_values_in
with runs of key-value pairs with consecutive equal-valued keys, only the first key and its value from each run is selectively copied tod_keys_out
andd_values_out
. The total number of items selected is written tod_num_selected_out
.The user-provided equality operator, equality_op, is used to determine whether keys are equivalent
Copies of the selected items are compacted into
d_out
and maintain their original relative ordering.In-place operations are not supported. There must be no overlap between any of the provided ranges:
[d_keys_in, d_keys_in + num_items)
[d_keys_out, d_keys_out + *d_num_selected_out)
[d_values_in, d_values_in + num_items)
[d_values_out, d_values_out + *d_num_selected_out)
[d_num_selected_out, d_num_selected_out + 1)
When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the compaction of items selected from an
int
device vector.#include <cub/cub.cuh> // or equivalently <cub/device/device_select.cuh> // Declare, allocate, and initialize device-accessible pointers // for input and output int num_items; // e.g., 8 int *d_keys_in; // e.g., [0, 2, 2, 9, 5, 5, 5, 8] int *d_values_in; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] int *d_keys_out; // e.g., [ , , , , , , , ] int *d_values_out; // e.g., [ , , , , , , , ] int *d_num_selected_out; // e.g., [ ] ... // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceSelect::UniqueByKey( d_temp_storage, temp_storage_bytes, d_keys_in, d_values_in, d_keys_out, d_values_out, d_num_selected_out, num_items); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run selection cub::DeviceSelect::UniqueByKey( d_temp_storage, temp_storage_bytes, d_keys_in, d_values_in, d_keys_out, d_values_out, d_num_selected_out, num_items); // d_keys_out <-- [0, 2, 9, 5, 8] // d_values_out <-- [1, 2, 4, 5, 8] // d_num_selected_out <-- [5]
- Template Parameters
KeyInputIteratorT – [inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)
ValueInputIteratorT – [inferred] Random-access input iterator type for reading input values (may be a simple pointer type)
KeyOutputIteratorT – [inferred] Random-access output iterator type for writing selected keys (may be a simple pointer type)
ValueOutputIteratorT – [inferred] Random-access output iterator type for writing selected values (may be a simple pointer type)
NumSelectedIteratorT – [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type)
NumItemsT – [inferred] Type of num_items
EqualityOpT – [inferred] Type of equality_op
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_keys_in – [in] Pointer to the input sequence of keys
d_values_in – [in] Pointer to the input sequence of values
d_keys_out – [out] Pointer to the output sequence of selected keys
d_values_out – [out] Pointer to the output sequence of selected values
d_num_selected_out – [out] Pointer to the total number of items selected (i.e., length of
d_keys_out
ord_values_out
)num_items – [in] Total number of input items (i.e., length of
d_keys_in
ord_values_in
)equality_op – [in] Binary predicate to determine equality
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename KeyInputIteratorT, typename ValueInputIteratorT, typename KeyOutputIteratorT, typename ValueOutputIteratorT, typename NumSelectedIteratorT, typename NumItemsT>
static inline cudaError_t UniqueByKey(void *d_temp_storage, size_t &temp_storage_bytes, KeyInputIteratorT d_keys_in, ValueInputIteratorT d_values_in, KeyOutputIteratorT d_keys_out, ValueOutputIteratorT d_values_out, NumSelectedIteratorT d_num_selected_out, NumItemsT num_items, cudaStream_t stream = 0) Given an input sequence
d_keys_in
andd_values_in
with runs of key-value pairs with consecutive equal-valued keys, only the first key and its value from each run is selectively copied tod_keys_out
andd_values_out
. The total number of items selected is written tod_num_selected_out
.The
==
equality operator is used to determine whether keys are equivalentCopies of the selected items are compacted into
d_out
and maintain their original relative ordering.In-place operations are not supported. There must be no overlap between any of the provided ranges:
[d_keys_in, d_keys_in + num_items)
[d_keys_out, d_keys_out + *d_num_selected_out)
[d_values_in, d_values_in + num_items)
[d_values_out, d_values_out + *d_num_selected_out)
[d_num_selected_out, d_num_selected_out + 1)
When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the compaction of items selected from an
int
device vector.#include <cub/cub.cuh> // or equivalently <cub/device/device_select.cuh> // Declare, allocate, and initialize device-accessible pointers // for input and output int num_items; // e.g., 8 int *d_keys_in; // e.g., [0, 2, 2, 9, 5, 5, 5, 8] int *d_values_in; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] int *d_keys_out; // e.g., [ , , , , , , , ] int *d_values_out; // e.g., [ , , , , , , , ] int *d_num_selected_out; // e.g., [ ] ... // Determine temporary device storage requirements void *d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceSelect::UniqueByKey( d_temp_storage, temp_storage_bytes, d_keys_in, d_values_in, d_keys_out, d_values_out, d_num_selected_out, num_items); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run selection cub::DeviceSelect::UniqueByKey( d_temp_storage, temp_storage_bytes, d_keys_in, d_values_in, d_keys_out, d_values_out, d_num_selected_out, num_items); // d_keys_out <-- [0, 2, 9, 5, 8] // d_values_out <-- [1, 2, 4, 5, 8] // d_num_selected_out <-- [5]
- Template Parameters
KeyInputIteratorT – [inferred] Random-access input iterator type for reading input keys (may be a simple pointer type)
ValueInputIteratorT – [inferred] Random-access input iterator type for reading input values (may be a simple pointer type)
KeyOutputIteratorT – [inferred] Random-access output iterator type for writing selected keys (may be a simple pointer type)
ValueOutputIteratorT – [inferred] Random-access output iterator type for writing selected values (may be a simple pointer type)
NumSelectedIteratorT – [inferred] Output iterator type for recording the number of items selected (may be a simple pointer type)
NumItemsT – [inferred] Type of num_items
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_keys_in – [in] Pointer to the input sequence of keys
d_values_in – [in] Pointer to the input sequence of values
d_keys_out – [out] Pointer to the output sequence of selected keys
d_values_out – [out] Pointer to the output sequence of selected values
d_num_selected_out – [out] Pointer to the total number of items selected (i.e., length of
d_keys_out
ord_values_out
)num_items – [in] Total number of input items (i.e., length of
d_keys_in
ord_values_in
)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.