cub::DeviceTransform

Defined in cub/device/device_transform.cuh

struct DeviceTransform

DeviceTransform provides device-wide, parallel operations for transforming elements tuple-wise from multiple input sequences into an output sequence.

Public Static Functions

template<typename ...RandomAccessIteratorsIn, typename RandomAccessIteratorOut, typename TransformOp>
static inline cudaError_t Transform(::cuda::std::tuple<RandomAccessIteratorsIn...> inputs, RandomAccessIteratorOut output, int num_items, TransformOp transform_op, cudaStream_t stream = nullptr)

Overview

Transforms many input sequences into one output sequence, by applying a transformation operation on corresponding input elements and writing the result to the corresponding output element. No guarantee is given on the identity (i.e. address) of the objects passed to the call operator of the transformation operation.

A Simple Example

constexpr auto num_items = 4;
auto input1              = thrust::device_vector<int>{0, -2, 5, 3};
auto input2              = thrust::device_vector<float>{5.2f, 3.1f, -1.1f, 3.0f};
auto input3              = thrust::counting_iterator<int>{100};
auto op                  = [] __device__(int a, float b, int c) {
  return (a + b) * c;
};

auto result = thrust::device_vector<int>(num_items);
cub::DeviceTransform::Transform(
  ::cuda::std::make_tuple(input1.begin(), input2.begin(), input3), result.begin(), num_items, op);

const auto expected = thrust::host_vector<float>{520, 111, 397, 618};

Parameters
  • inputs – A tuple of iterators to the input sequences where num_items elements are read from each. The iterators’ value types must be trivially relocatable.

  • output – An iterator to the output sequence where num_items results are written to.

  • num_items – The number of elements in each input sequence.

  • transform_op – An n-ary function object, where n is the number of input sequences. The input iterators’ value types must be convertible to the parameters of the function object’s call operator. The return type of the call operator must be assignable to the dereferenced output iterator.

  • stream[optional] CUDA stream to launch kernels within. Default is stream\ :sub:0.

template<typename RandomAccessIteratorIn, typename RandomAccessIteratorOut, typename TransformOp>
static inline cudaError_t Transform(RandomAccessIteratorIn input, RandomAccessIteratorOut output, int num_items, TransformOp transform_op, cudaStream_t stream = nullptr)

Transforms one input sequence into one output sequence, by applying a transformation operation on corresponding input elements and writing the result to the corresponding output element. No guarantee is given on the identity (i.e. address) of the objects passed to the call operator of the transformation operation.

Parameters
  • input – An iterator to the input sequence where num_items elements are read from. The iterator’s value type must be trivially relocatable.

  • output – An iterator to the output sequence where num_items results are written to.

  • num_items – The number of elements in each input sequence.

  • transform_op – An n-ary function object, where n is the number of input sequences. The input iterators’ value types must be convertible to the parameters of the function object’s call operator. The return type of the call operator must be assignable to the dereferenced output iterator.

  • stream[optional] CUDA stream to launch kernels within. Default is stream\ :sub:0.

template<typename ...RandomAccessIteratorsIn, typename RandomAccessIteratorOut, typename TransformOp>
static inline cudaError_t TransformStableArgumentAddresses(::cuda::std::tuple<RandomAccessIteratorsIn...> inputs, RandomAccessIteratorOut output, int num_items, TransformOp transform_op, cudaStream_t stream = nullptr)

Overview

Transforms many input sequences into one output sequence, by applying a transformation operation on corresponding input elements and writing the result to the corresponding output element. The objects passed to the call operator of the transformation operation are guaranteed to reside in the input sequences and are never copied.

A Simple Example

constexpr auto num_items = 4;
auto input1              = thrust::device_vector<int>{0, -2, 5, 3};
auto input2              = thrust::device_vector<int>{52, 31, -11, 30};

auto* input1_ptr = thrust::raw_pointer_cast(input1.data());
auto* input2_ptr = thrust::raw_pointer_cast(input2.data());

auto op = [input1_ptr, input2_ptr] __device__(const int& a) -> int {
  const auto i = &a - input1_ptr; // we depend on the address of a
  return a + input2_ptr[i];
};

auto result = thrust::device_vector<int>(num_items);
cub::DeviceTransform::TransformStableArgumentAddresses(
  ::cuda::std::make_tuple(input1_ptr), result.begin(), num_items, op);

const auto expected = thrust::host_vector<float>{52, 29, -6, 33};

Parameters
  • inputs – A tuple of iterators to the input sequences where num_items elements are read from each. The iterators’ value types must be trivially relocatable.

  • output – An iterator to the output sequence where num_items results are written to.

  • num_items – The number of elements in each input sequence.

  • transform_op – An n-ary function object, where n is the number of input sequences. The input iterators’ value types must be convertible to the parameters of the function object’s call operator. The return type of the call operator must be assignable to the dereferenced output iterator.

  • stream[optional] CUDA stream to launch kernels within. Default is stream\ :sub:0.

template<typename RandomAccessIteratorIn, typename RandomAccessIteratorOut, typename TransformOp>
static inline cudaError_t TransformStableArgumentAddresses(RandomAccessIteratorIn input, RandomAccessIteratorOut output, int num_items, TransformOp transform_op, cudaStream_t stream = nullptr)

Transforms one input sequence into one output sequence, by applying a transformation operation on corresponding input elements and writing the result to the corresponding output element. The objects passed to the call operator of the transformation operation are guaranteed to reside in the input sequences and are never copied.

Parameters
  • input – An iterator to the input sequence where num_items elements are read from. The iterator’s value type must be trivially relocatable.

  • output – An iterator to the output sequence where num_items results are written to.

  • num_items – The number of elements in each input sequence.

  • transform_op – An n-ary function object, where n is the number of input sequences. The input iterators’ value types must be convertible to the parameters of the function object’s call operator. The return type of the call operator must be assignable to the dereferenced output iterator.

  • stream[optional] CUDA stream to launch kernels within. Default is stream\ :sub:0.