thrust::reduce_into

Defined in thrust/reduce.h

template<typename DerivedPolicy, typename InputIterator, typename OutputIterator> void thrust::reduce_into(const thrust::detail::execution_policy_base<DerivedPolicy> &exec, InputIterator first, InputIterator last, OutputIterator output)

reduce_into is a generalization of summation: it computes the sum (or some other binary operation) of all the elements in the range [first, last). This version of reduce_into uses 0 as the initial value of the reduction. reduce_into is similar to the C++ Standard Template Library’s std::accumulate. The primary difference between the two functions is that std::accumulate guarantees the order of summation, while reduce_into requires associativity of the binary operation to parallelize the reduction.

Note that reduce_into also assumes that the binary reduction operator (in this case operator+) is commutative. If the reduction operator is not commutative then reduce_into should not be used. Instead, one could use inclusive_scan (which does not require commutativity) and select the last element of the output array.

Unlike reduce, reduce_into does not return the reduction result. Instead, it is written to *output. Thus, when exec is thrust::cuda::par_nosync, this algorithm does not wait for the work it launches to complete. Additionally, you can use reduce_into to avoid copying the reduction result from device memory to host memory.

The algorithm’s execution is parallelized as determined by exec.

The following code snippet demonstrates how to use reduce_into to compute the sum of a sequence of integers using the thrust::device execution policy for parallelization:

#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

thrust::device_vector<int> data{1, 0, 2, 2, 1, 3};
thrust::device_vector<int> output(1);
thrust::reduce_into(thrust::device, data.begin(), data.end(), output.begin());
// output[0] == 9