CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Defines basic structures needed for implementing the warp-scoped phase of the epilogue. These quantities assume a 'column-major' arrangement of TensorOp instructions, of which a row-oriented slice is visible per iteration. More...