CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers

Files  
file  default_mma_tensor_op.h [code] 
Default warplevel GEMM operators selected by data type, size, and layouts of operands.  
file  default_mma_wmma_tensor_op.h [code] 
Default warplevel GEMM operators selected by data type, size, and layouts of operands.  
file  gemm/warp/mma.h [code] 
Templates exposing architecture support for warplevel multiplyadd operations.  
file  mma_complex_tensor_op.h [code] 
Templates implementing warplevel matrix multiplyaccumulate operations targeting Tensor Cores.  
file  mma_simt.h [code] 
Templates implementing warplevel matrix multiplyaccumulate operations.  
file  mma_simt_policy.h [code] 
Describes the lane policy used by warplevel matrix multiply operators targeting SIMT instructions.  
file  mma_simt_tile_iterator.h [code] 
Describes the lane policy used by warplevel matrix multiply operators targeting SIMT instructions.  
file  mma_tensor_op.h [code] 
Templates implementing warplevel matrix multiplyaccumulate operations targeting Tensor Cores.  
file  mma_tensor_op_policy.h [code] 
Policy describing implementation details of warplevel GEMM targeting Tensor Cores.  
file  mma_tensor_op_sm70.h [code] 
Templates implementing warplevel matrix multiplyaccumulate operations targeting Tensor Cores.  
file  mma_tensor_op_tile_iterator.h [code] 
Defines iterators used by warplevel matrix multiply operations targeting Tensor Cores.  
file  mma_tensor_op_tile_iterator_sm70.h [code] 
Defines iterators used by warplevel matrix multiply operations targeting Tensor Cores.  
file  mma_tensor_op_tile_iterator_wmma.h [code] 
Defines iterators used by warplevel matrix multiply operations targeting Tensor Cores.  
file  mma_tensor_op_wmma.h [code] 
Templates implementing warplevel matrix multiplyaccumulate operations targeting Tensor Cores.  