thrust::reduce_into

Defined in thrust/reduce.h

template<typename InputIterator, typename OutputIterator, typename T> void thrust::reduce_into(InputIterator first, InputIterator last, OutputIterator output, T init)

reduce_into is a generalization of summation: it computes the sum (or some other binary operation) of all the elements in the range [first, last). This version of reduce_into uses init as the initial value of the reduction. reduce_into is similar to the C++ Standard Template Library’s std::accumulate. The primary difference between the two functions is that std::accumulate guarantees the order of summation, while reduce_into requires associativity of the binary operation to parallelize the reduction.

Note that reduce_into also assumes that the binary reduction operator (in this case operator+) is commutative. If the reduction operator is not commutative then reduce_into should not be used. Instead, one could use inclusive_scan (which does not require commutativity) and select the last element of the output array.

Unlike reduce, reduce_into does not return the reduction result. Instead, it is written to *output. Thus, when exec is thrust::cuda::par_nosync, this algorithm does not wait for the work it launches to complete. Additionally, you can use reduce_into to avoid copying the reduction result from device memory to host memory.

The following code snippet demonstrates how to use reduce_into to compute the sum of a sequence of integers including an initialization value.

#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

thrust::device_vector<int> data{1, 0, 2, 2, 1, 3};
thrust::device_vector<int> output(1);
thrust::reduce_into(data.begin(), data.end(), output.begin(), 1);
// output[0] == 10