cub::DeviceTransform#

struct DeviceTransform#

DeviceTransform provides device-wide, parallel operations for transforming elements tuple-wise from multiple input sequences into an output sequence.

Public Static Functions

template<typename ...RandomAccessIteratorsIn, typename RandomAccessIteratorOut, typename NumItemsT, typename TransformOp>
static inline cudaError_t Transform(
::cuda::std::tuple<RandomAccessIteratorsIn...> inputs,
RandomAccessIteratorOut output,
NumItemsT num_items,
TransformOp transform_op,
cudaStream_t stream = nullptr,
)#

Overview#

Transforms many input sequences into one output sequence, by applying a transformation operation on corresponding input elements and writing the result to the corresponding output element. No guarantee is given on the identity (i.e. address) of the objects passed to the call operator of the transformation operation.

A Simple Example#

auto input1 = thrust::device_vector<int>{0, -2, 5, 3};
auto input2 = thrust::device_vector<float>{5.2f, 3.1f, -1.1f, 3.0f};
auto input3 = thrust::counting_iterator<int>{100};
auto op     = [] __device__(int a, float b, int c) {
  return (a + b) * c;
};

auto result = thrust::device_vector<int>(input1.size());
cub::DeviceTransform::Transform(
  cuda::std::tuple{input1.begin(), input2.begin(), input3}, result.begin(), input1.size(), op);

const auto expected = thrust::host_vector<float>{520, 111, 397, 618};

Parameters:
  • inputs – A tuple of iterators to the input sequences where num_items elements are read from each. The iterators’ value types must be trivially relocatable.

  • output – An iterator to the output sequence where num_items results are written to. May point to the beginning of one of the input sequences, performing the transformation inplace. The output sequence must not overlap with any of the input sequence in any other way.

  • num_items – The number of elements in each input sequence.

  • transform_op – An n-ary function object, where n is the number of input sequences. The input iterators’ value types must be convertible to the parameters of the function object’s call operator. The return type of the call operator must be assignable to the dereferenced output iterator.

  • stream[optional] CUDA stream to launch kernels within. Default is stream\ :sub:0.

template<typename RandomAccessIteratorIn, typename RandomAccessIteratorOut, typename NumItemsT, typename TransformOp>
static inline cudaError_t Transform(
RandomAccessIteratorIn input,
RandomAccessIteratorOut output,
NumItemsT num_items,
TransformOp transform_op,
cudaStream_t stream = nullptr,
)#

Transforms one input sequence into one output sequence, by applying a transformation operation on each input element and writing the result to the corresponding output element. No guarantee is given on the identity (i.e. address) of the objects passed to the call operator of the transformation operation.

Parameters:
  • input – An iterator to the input sequence where num_items elements are read from. The iterator’s value type must be trivially relocatable.

  • output – An iterator to the output sequence where num_items results are written to. May point to the same sequence as input, performing the transformation inplace. The output sequence must not overlap with the input sequence in any other way.

  • num_items – The number of elements in each input sequence.

  • transform_op – A unary function object. The input iterator’s value type must be convertible to the parameter of the function object’s call operator. The return type of the call operator must be assignable to the dereferenced output iterator.

  • stream[optional] CUDA stream to launch kernels within. Default is stream\ :sub:0.

template<typename ...RandomAccessIteratorsIn, typename RandomAccessIteratorOut, typename NumItemsT, typename Predicate, typename TransformOp>
static inline cudaError_t TransformIf(
::cuda::std::tuple<RandomAccessIteratorsIn...> inputs,
RandomAccessIteratorOut output,
NumItemsT num_items,
Predicate predicate,
TransformOp transform_op,
cudaStream_t stream = nullptr,
)#

Fills the output sequence by invoking a generator operation for each output element and writing the result to it. This is effectively calling Transform with no input sequences.

@param output An iterator to the output sequence where num_items results are written to. @param num_items The number of elements to write to the output sequence. @param generator A nullary function object. The return type of the call operator must be assignable to the dereferenced output iterator. @param stream [optional] CUDA stream to launch kernels within. Default is stream0. */

template <typename RandomAccessIteratorOut, typename NumItemsT, typename Generator>

static cudaError_t

Fill(RandomAccessIteratorOut output, NumItemsT num_items, Generator generator, cudaStream_t stream = nullptr) {

return Transform(

::cuda::std::make_tuple(), ::cuda::std::move(output), num_items, ::cuda::std::move(generator), stream);

}

// Do not document

// _CCCL_DOXYGEN_INVOKED

/ * verbatim embed:rst:leading-asterisk

Selectively transforms many input sequences into one output sequence, by applying a transformation operation on corresponding input elements, if a given predicate is true, and writing the result to the corresponding output element. No guarantee is given on the identity (i.e. address) of the objects passed to the call operator of the predicate and transformation operation. Output elements for which the predicate returns false are not written to.

auto input     = thrust::device_vector<int>{0, -1, 2, -3, 4, -5};
auto predicate = [] __device__(int value) {
  return value < 0;
};
auto op = [] __device__(int value) {
  return value * 2;
};

auto result = thrust::device_vector<int>(input.size()); // initialized to zeros
cub::DeviceTransform::TransformIf(cuda::std::tuple{input.begin()}, result.begin(), input.size(), predicate, op);

const auto expected = thrust::host_vector<float>{0, -2, 0, -6, 0, -10};

Parameters:
  • inputs – A tuple of iterators to the input sequences where num_items elements are read from each. The iterators’ value types must be trivially relocatable.

  • output – An iterator to the output sequence where num_items results are written to. May point to the beginning of one of the input sequences, performing the transformation inplace. The output sequence must not overlap with any of the input sequence in any other way.

  • num_items – The number of elements in each input sequence.

  • predicate – An n-ary function object, where n is the number of input sequences. The input iterators’ value types must be convertible to the parameters of the function object’s call operator, which must return a boolean value.

  • transform_op – An n-ary function object, where n is the number of input sequences. The input iterators’ value types must be convertible to the parameters of the function object’s call operator. The return type of the call operator must be assignable to the dereferenced output iterator. Will only be invoked if predicate returns true.

  • stream[optional] CUDA stream to launch kernels within. Default is stream\ :sub:0.

template<typename RandomAccessIteratorIn, typename RandomAccessIteratorOut, typename NumItemsT, typename Predicate, typename TransformOp>
static inline cudaError_t TransformIf(
RandomAccessIteratorIn input,
RandomAccessIteratorOut output,
NumItemsT num_items,
Predicate predicate,
TransformOp transform_op,
cudaStream_t stream = nullptr,
)#

Overview#

Selectively transforms one input sequence into one output sequence, by applying a transformation operation on each input element, if a given predicate is true, and writing the result to the corresponding output element. No guarantee is given on the identity (i.e. address) of the objects passed to the call operator of the predicate and transformation operation. Output elements for which the predicate returns false are not written to.

A Simple Example#

auto input     = thrust::device_vector<int>{0, -1, 2, -3, 4, -5};
auto predicate = [] __device__(int value) {
  return value < 0;
};
auto op = [] __device__(int value) {
  return value * 2;
};

auto result = thrust::device_vector<int>(input.size()); // initialized to zeros
cub::DeviceTransform::TransformIf(cuda::std::tuple{input.begin()}, result.begin(), input.size(), predicate, op);

const auto expected = thrust::host_vector<float>{0, -2, 0, -6, 0, -10};

Parameters:
  • input – An iterator to the input sequence where num_items elements are read from. The iterator’s value type must be trivially relocatable.

  • output – An iterator to the output sequence where num_items results are written to. May point to the same sequence as input, performing the transformation inplace. The output sequence must not overlap with the input sequence in any other way.

  • num_items – The number of elements in each input sequence.

  • predicate – A unary function objects returning bool. The input iterators’ value types must be convertible to the parameters of the function object’s call operator.

  • transform_op – A unary function object. The input iterator’s value type must be convertible to the parameter of the function object’s call operator. The return type of the call operator must be assignable to the dereferenced output iterator. Will only be invoked if predicate returns true.

  • stream[optional] CUDA stream to launch kernels within. Default is stream\ :sub:0.

template<typename ...RandomAccessIteratorsIn, typename RandomAccessIteratorOut, typename NumItemsT, typename TransformOp>
static inline cudaError_t TransformStableArgumentAddresses(
::cuda::std::tuple<RandomAccessIteratorsIn...> inputs,
RandomAccessIteratorOut output,
NumItemsT num_items,
TransformOp transform_op,
cudaStream_t stream = nullptr,
)#

Overview#

Transforms many input sequences into one output sequence, by applying a transformation operation on corresponding input elements and writing the result to the corresponding output element. The objects passed to the call operator of the transformation operation are guaranteed to reside in the input sequences and are never copied.

A Simple Example#

auto input1 = thrust::device_vector<int>{0, -2, 5, 3};
auto input2 = thrust::device_vector<int>{52, 31, -11, 30};

auto* input1_ptr = thrust::raw_pointer_cast(input1.data());
auto* input2_ptr = thrust::raw_pointer_cast(input2.data());

auto op = [input1_ptr, input2_ptr] __device__(const int& a) -> int {
  const auto i = &a - input1_ptr; // we depend on the address of a
  return a + input2_ptr[i];
};

auto result = thrust::device_vector<int>(input1.size());
cub::DeviceTransform::TransformStableArgumentAddresses(
  cuda::std::tuple{input1_ptr}, result.begin(), input1.size(), op);

const auto expected = thrust::host_vector<float>{52, 29, -6, 33};

Parameters:
  • inputs – A tuple of iterators to the input sequences where num_items elements are read from each. The iterators’ value types must be trivially relocatable.

  • output – An iterator to the output sequence where num_items results are written to. May point to the beginning of one of the input sequences, performing the transformation inplace. The output sequence must not overlap with any of the input sequence in any other way.

  • num_items – The number of elements in each input sequence.

  • transform_op – An n-ary function object, where n is the number of input sequences. The input iterators’ value types must be convertible to the parameters of the function object’s call operator. The return type of the call operator must be assignable to the dereferenced output iterator.

  • stream[optional] CUDA stream to launch kernels within. Default is stream\ :sub:0.

template<typename RandomAccessIteratorIn, typename RandomAccessIteratorOut, typename NumItemsT, typename TransformOp>
static inline cudaError_t TransformStableArgumentAddresses(
RandomAccessIteratorIn input,
RandomAccessIteratorOut output,
NumItemsT num_items,
TransformOp transform_op,
cudaStream_t stream = nullptr,
)#

Transforms one input sequence into one output sequence, by applying a transformation operation on corresponding input elements and writing the result to the corresponding output element. The objects passed to the call operator of the transformation operation are guaranteed to reside in the input sequences and are never copied.

Parameters:
  • input – An iterator to the input sequence where num_items elements are read from. The iterator’s value type must be trivially relocatable.

  • output – An iterator to the output sequence where num_items results are written to. May point to the beginning of one of the input sequences, performing the transformation inplace. The output sequence must not overlap with any of the input sequence in any other way.

  • num_items – The number of elements in each input sequence.

  • transform_op – An n-ary function object, where n is the number of input sequences. The input iterators’ value types must be convertible to the parameters of the function object’s call operator. The return type of the call operator must be assignable to the dereferenced output iterator.

  • stream[optional] CUDA stream to launch kernels within. Default is stream\ :sub:0.