CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers