thrust::reduce_into

Defined in thrust/reduce.h

template<typename DerivedPolicy, typename InputIterator, typename OutputIterator, typename T, typename BinaryFunction> void thrust::reduce_into(const thrust::detail::execution_policy_base<DerivedPolicy> &exec, InputIterator first, InputIterator last, OutputIterator output, T init, BinaryFunction binary_op)

reduce_into is a generalization of summation: it computes the sum (or some other binary operation) of all the elements in the range [first, last). This version of reduce_into uses init as the initial value of the reduction and binary_op as the binary function used for summation. reduce_into is similar to the C++ Standard Template Library’s std::accumulate. The primary difference between the two functions is that std::accumulate guarantees the order of summation, while reduce_into requires associativity of binary_op to parallelize the reduction.

Note that reduce_into also assumes that the binary reduction operator (in this case binary_op) is commutative. If the reduction operator is not commutative then reduce_into should not be used. Instead, one could use inclusive_scan (which does not require commutativity) and select the last element of the output array.

Unlike reduce, reduce_into does not return the reduction result. Instead, it is written to *output. Thus, when exec is thrust::cuda::par_nosync, this algorithm does not wait for the work it launches to complete. Additionally, you can use reduce_into to avoid copying the reduction result from device memory to host memory.

The algorithm’s execution is parallelized as determined by exec.

The following code snippet demonstrates how to use reduce_into to compute the maximum value of a sequence of integers using the thrust::device execution policy for parallelization:

#include <cuda/functional>
#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

thrust::device_vector<int> data{1, 0, 2, 2, 1, 3};
thrust::device_vector<int> output(1);
thrust::reduce_into(thrust::device,
                    data.begin(), data.end(), output.begin(), -1,
                    cuda::maximum{});
// output[0] == 3

See also

reduce

See also

transform_reduce

See also

transform_reduce_into

Parameters

exec – The execution policy to use for parallelization.
first – The beginning of the input sequence.
last – The end of the input sequence.
output – The location the reduction will be written to.
init – The initial value.
binary_op – The binary function used to ‘sum’ values.

Template Parameters

DerivedPolicy – The name of the derived execution policy.
InputIterator – is a model of Input Iterator and InputIterator's value_type is convertible to T.
OutputIterator – is a model of Output Iterator and OutputIterator's value_type is assignable from T.
T – is a model of Assignable, and is convertible to BinaryFunction's first and second argument type.
BinaryFunction – The function’s return type must be convertible to OutputType.

Returns

The result of the reduction.