thrust::reduce_into#

Overloads#

`reduce_into(exec, first, last, output)`#

template<typename DerivedPolicy, typename InputIterator, typename OutputIterator> void thrust::reduce_into( const thrust::detail::execution_policy_base<DerivedPolicy> &exec, InputIterator first, InputIterator last, OutputIterator output )#

reduce_into is a generalization of summation: it computes the sum (or some other binary operation) of all the elements in the range [first, last). This version of reduce_into uses 0 as the initial value of the reduction. reduce_into is similar to the C++ Standard Template Library’s std::accumulate. The primary difference between the two functions is that std::accumulate guarantees the order of summation, while reduce_into requires associativity of the binary operation to parallelize the reduction.

Note that reduce_into also assumes that the binary reduction operator (in this case operator+) is commutative. If the reduction operator is not commutative then reduce_into should not be used. Instead, one could use inclusive_scan (which does not require commutativity) and select the last element of the output array.

Unlike reduce, reduce_into does not return the reduction result. Instead, it is written to *output. Thus, when exec is thrust::cuda::par_nosync, this algorithm does not wait for the work it launches to complete. Additionally, you can use reduce_into to avoid copying the reduction result from device memory to host memory.

The algorithm’s execution is parallelized as determined by exec.

The following code snippet demonstrates how to use reduce_into to compute the sum of a sequence of integers using the thrust::device execution policy for parallelization:

#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

thrust::device_vector<int> data{1, 0, 2, 2, 1, 3};
thrust::device_vector<int> output(1);
thrust::reduce_into(thrust::device, data.begin(), data.end(), output.begin());
// output[0] == 9

Added in version 3.2.0.

See also

reduce

Parameters:

exec – The execution policy to use for parallelization.
first – The beginning of the sequence.
last – The end of the sequence.
output – The location the reduction will be written to.

Template Parameters:

DerivedPolicy – The name of the derived execution policy.
InputIterator – is a model of Input Iterator and if x and y are objects of InputIterator's value_type, then x + y is defined and is convertible to InputIterator's value_type. If T is InputIterator's value_type, then T(0) is defined.
OutputIterator – is a model of Output Iterator and OutputIterator's value_type is assignable from InputIterator's value_type.

`reduce_into(first, last, output)`#

template<typename InputIterator, typename OutputIterator> void thrust::reduce_into( InputIterator first, InputIterator last, OutputIterator output )#

reduce_into is a generalization of summation: it computes the sum (or some other binary operation) of all the elements in the range [first, last). This version of reduce_into uses 0 as the initial value of the reduction. reduce_into is similar to the C++ Standard Template Library’s std::accumulate. The primary difference between the two functions is that std::accumulate guarantees the order of summation, while reduce_into requires associativity of the binary operation to parallelize the reduction.

Note that reduce_into also assumes that the binary reduction operator (in this case operator+) is commutative. If the reduction operator is not commutative then reduce_into should not be used. Instead, one could use inclusive_scan (which does not require commutativity) and select the last element of the output array.

Unlike reduce, reduce_into does not return the reduction result. Instead, it is written to *output. Thus, when exec is thrust::cuda::par_nosync, this algorithm does not wait for the work it launches to complete. Additionally, you can use reduce_into to avoid copying the reduction result from device memory to host memory.

The following code snippet demonstrates how to use reduce_into to compute the sum of a sequence of integers.

#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

thrust::device_vector<int> data{1, 0, 2, 2, 1, 3};
thrust::device_vector<int> output(1);
thrust::reduce_into(data.begin(), data.end(), output.begin());
// output[0] == 9

Added in version 3.2.0.

See also

reduce

Parameters:

first – The beginning of the sequence.
last – The end of the sequence.
output – The location the reduction will be written to.

Template Parameters:

InputIterator – is a model of Input Iterator and if x and y are objects of InputIterator's value_type, then x + y is defined and is convertible to InputIterator's value_type. If T is InputIterator's value_type, then T(0) is defined.
OutputIterator – is a model of Output Iterator and OutputIterator's value_type is assignable from InputIterator's value_type.

`reduce_into(exec, first, last, output, init)`#

template<typename DerivedPolicy, typename InputIterator, typename OutputIterator, typename T> void thrust::reduce_into( const thrust::detail::execution_policy_base<DerivedPolicy> &exec, InputIterator first, InputIterator last, OutputIterator output, T init )#

reduce_into is a generalization of summation: it computes the sum (or some other binary operation) of all the elements in the range [first, last). This version of reduce_into uses init as the initial value of the reduction. reduce_into is similar to the C++ Standard Template Library’s std::accumulate. The primary difference between the two functions is that std::accumulate guarantees the order of summation, while reduce_into requires associativity of the binary operation to parallelize the reduction.

Note that reduce_into also assumes that the binary reduction operator (in this case operator+) is commutative. If the reduction operator is not commutative then reduce_into should not be used. Instead, one could use inclusive_scan (which does not require commutativity) and select the last element of the output array.

Unlike reduce, reduce_into does not return the reduction result. Instead, it is written to *output. Thus, when exec is thrust::cuda::par_nosync, this algorithm does not wait for the work it launches to complete. Additionally, you can use reduce_into to avoid copying the reduction result from device memory to host memory.

The algorithm’s execution is parallelized as determined by exec.

The following code snippet demonstrates how to use reduce_into to compute the sum of a sequence of integers including an initialization value using the thrust::device execution policy for parallelization:

#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

thrust::device_vector<int> data{1, 0, 2, 2, 1, 3};
thrust::device_vector<int> output(1);
thrust::reduce_into(thrust::device, data.begin(), data.end(), output.begin(), 1);
// output[0] == 10

Added in version 3.2.0.

See also

reduce

Parameters:

exec – The execution policy to use for parallelization.
first – The beginning of the input sequence.
last – The end of the input sequence.
output – The location the reduction will be written to.
init – The initial value.

Template Parameters:

DerivedPolicy – The name of the derived execution policy.
InputIterator – is a model of Input Iterator and if x and y are objects of InputIterator's value_type, then x + y is defined and is convertible to T.
OutputIterator – is a model of Output Iterator and OutputIterator's value_type is assignable from T.
T – is convertible to InputIterator's value_type.

`reduce_into(first, last, output, init)`#

template<typename InputIterator, typename OutputIterator, typename T> void thrust::reduce_into( InputIterator first, InputIterator last, OutputIterator output, T init )#

reduce_into is a generalization of summation: it computes the sum (or some other binary operation) of all the elements in the range [first, last). This version of reduce_into uses init as the initial value of the reduction. reduce_into is similar to the C++ Standard Template Library’s std::accumulate. The primary difference between the two functions is that std::accumulate guarantees the order of summation, while reduce_into requires associativity of the binary operation to parallelize the reduction.

Note that reduce_into also assumes that the binary reduction operator (in this case operator+) is commutative. If the reduction operator is not commutative then reduce_into should not be used. Instead, one could use inclusive_scan (which does not require commutativity) and select the last element of the output array.

Unlike reduce, reduce_into does not return the reduction result. Instead, it is written to *output. Thus, when exec is thrust::cuda::par_nosync, this algorithm does not wait for the work it launches to complete. Additionally, you can use reduce_into to avoid copying the reduction result from device memory to host memory.

The following code snippet demonstrates how to use reduce_into to compute the sum of a sequence of integers including an initialization value.

#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

thrust::device_vector<int> data{1, 0, 2, 2, 1, 3};
thrust::device_vector<int> output(1);
thrust::reduce_into(data.begin(), data.end(), output.begin(), 1);
// output[0] == 10

Added in version 3.2.0.

See also

reduce

Parameters:

first – The beginning of the input sequence.
last – The end of the input sequence.
output – The location the reduction will be written to.
init – The initial value.

Template Parameters:

InputIterator – is a model of Input Iterator and if x and y are objects of InputIterator's value_type, then x + y is defined and is convertible to T.
OutputIterator – is a model of Output Iterator and OutputIterator's value_type is assignable from T.
T – is convertible to InputIterator's value_type.

`reduce_into(exec, first, last, output, init, binary_op)`#

template<typename DerivedPolicy, typename InputIterator, typename OutputIterator, typename T, typename BinaryFunction> void thrust::reduce_into( const thrust::detail::execution_policy_base<DerivedPolicy> &exec, InputIterator first, InputIterator last, OutputIterator output, T init, BinaryFunction binary_op )#

reduce_into is a generalization of summation: it computes the sum (or some other binary operation) of all the elements in the range [first, last). This version of reduce_into uses init as the initial value of the reduction and binary_op as the binary function used for summation. reduce_into is similar to the C++ Standard Template Library’s std::accumulate. The primary difference between the two functions is that std::accumulate guarantees the order of summation, while reduce_into requires associativity of binary_op to parallelize the reduction.

Note that reduce_into also assumes that the binary reduction operator (in this case binary_op) is commutative. If the reduction operator is not commutative then reduce_into should not be used. Instead, one could use inclusive_scan (which does not require commutativity) and select the last element of the output array.

Unlike reduce, reduce_into does not return the reduction result. Instead, it is written to *output. Thus, when exec is thrust::cuda::par_nosync, this algorithm does not wait for the work it launches to complete. Additionally, you can use reduce_into to avoid copying the reduction result from device memory to host memory.

The algorithm’s execution is parallelized as determined by exec.

The following code snippet demonstrates how to use reduce_into to compute the maximum value of a sequence of integers using the thrust::device execution policy for parallelization:

#include <cuda/functional>
#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

thrust::device_vector<int> data{1, 0, 2, 2, 1, 3};
thrust::device_vector<int> output(1);
thrust::reduce_into(thrust::device,
                    data.begin(), data.end(), output.begin(), -1,
                    cuda::maximum{});
// output[0] == 3

Added in version 3.2.0.

See also

reduce

See also

transform_reduce

See also

transform_reduce_into

Parameters:

exec – The execution policy to use for parallelization.
first – The beginning of the input sequence.
last – The end of the input sequence.
output – The location the reduction will be written to.
init – The initial value.
binary_op – The binary function used to ‘sum’ values.

Template Parameters:

DerivedPolicy – The name of the derived execution policy.
InputIterator – is a model of Input Iterator and InputIterator's value_type is convertible to T.
OutputIterator – is a model of Output Iterator and OutputIterator's value_type is assignable from T.
T – is a model of Assignable, and is convertible to BinaryFunction's first and second argument type.
BinaryFunction – The function’s return type must be convertible to OutputType.

Returns:

The result of the reduction.

`reduce_into(first, last, output, init, binary_op)`#

template<typename InputIterator, typename OutputIterator, typename T, typename BinaryFunction> void thrust::reduce_into( InputIterator first, InputIterator last, OutputIterator output, T init, BinaryFunction binary_op )#

reduce_into is a generalization of summation: it computes the sum (or some other binary operation) of all the elements in the range [first, last). This version of reduce_into uses init as the initial value of the reduction and binary_op as the binary function used for summation. reduce_into is similar to the C++ Standard Template Library’s std::accumulate. The primary difference between the two functions is that std::accumulate guarantees the order of summation, while reduce_into requires associativity of binary_op to parallelize the reduction.

Note that reduce_into also assumes that the binary reduction operator (in this case binary_op) is commutative. If the reduction operator is not commutative then thrust::reduce_into should not be used. Instead, one could use inclusive_scan (which does not require commutativity) and select the last element of the output array.

The following code snippet demonstrates how to use reduce_into to compute the maximum value of a sequence of integers.

#include <cuda/functional>
#include <thrust/reduce.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>

thrust::device_vector<int> data{1, 0, 2, 2, 1, 3};
thrust::device_vector<int> output(1);
thrust::reduce_into(data.begin(), data.end(), output.begin(), -1,
                    cuda::maximum{});
// output[0] == 3

Added in version 3.2.0.

See also

reduce

See also

transform_reduce

See also

transform_reduce_into

Parameters:

first – The beginning of the input sequence.
last – The end of the input sequence.
output – An output iterator to write the result to.
init – The initial value.
binary_op – The binary function used to ‘sum’ values.

Template Parameters:

InputIterator – is a model of Input Iterator and InputIterator's value_type is convertible to T.
OutputIterator – is a model of Output Iterator and OutputIterator's value_type is assignable from T.
T – is a model of Assignable, and is convertible to BinaryFunction's first and second argument type.
BinaryFunction – The function’s return type must be convertible to OutputType.

Returns:

The result of the reduction.

thrust::reduce_into#

Overloads#

reduce_into(exec, first, last, output)#

reduce_into(first, last, output)#

reduce_into(exec, first, last, output, init)#

reduce_into(first, last, output, init)#

reduce_into(exec, first, last, output, init, binary_op)#

reduce_into(first, last, output, init, binary_op)#

`reduce_into(exec, first, last, output)`#

`reduce_into(first, last, output)`#

`reduce_into(exec, first, last, output, init)`#

`reduce_into(first, last, output, init)`#

`reduce_into(exec, first, last, output, init, binary_op)`#

`reduce_into(first, last, output, init, binary_op)`#