CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Directories | |
directory | kernel |
directory | thread |
Files | |
file | batched_reduction.h [code] |
Implements a software-pipelined efficient batched reduction. D = alpha * Reduction(A) + beta * C. | |
file | batched_reduction_traits.h [code] |
Defines structural properties of complete batched reduction. D = alpha * Reduction(A) + beta * C. | |
file | reduction/threadblock_swizzle.h [code] |
Defies functors for mapping blockIdx to partitions of the batched reduction computation. | |