CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Defines basic properties needed by CTA-level GEMMs assuming expectations about data layout of the global memory fragments, data types, and internal tile sizes. More...
#include "cutlass/cutlass.h"
#include "cutlass/array.h"
#include "cutlass/fast_math.h"
#include "cutlass/arch/wmma.h"
Go to the source code of this file.
Partial specializations for threadblock::Mma operations targeting TensorOp instructions.