CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Files | |
file | reduce.h [code] |
Defines basic thread level reduction with specializations for Array<T, N>. | |
file | reduction_operators.h [code] |
Kernel performing a reduction over densely packed tensors in global memory. | |