GPU Reduction#

Warning

The reduction decorator is deprecated and provided for backward compatibility with code written for Numba-CUDA. Users are recommended to use the cuda.compute parallel computing primitives from the CUDA Core Compute Libraries for new code.

Writing a reduction algorithm for CUDA GPU can be tricky. Numba CUDA MLIR provides a @reduce decorator for converting a simple binary operation into a reduction kernel. An example follows:

import numpy
from numba_cuda_mlir import cuda

@cuda.reduce
def sum_reduce(a, b):
    return a + b

A = (numpy.arange(1234, dtype=numpy.float64)) + 1
expect = A.sum()      # NumPy sum reduction
got = sum_reduce(A)   # cuda sum reduction
assert expect == got

Lambda functions can also be used here:

sum_reduce = cuda.reduce(lambda a, b: a + b)

The Reduce class#

The reduce decorator creates an instance of the Reduce class. Currently, reduce is an alias to Reduce, but this behavior is not guaranteed.