CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Files | |
file | default_gemm.h [code] |
Default kernel-level GEMM definitions combine threadblock-scoped matrix multiply-add with the appropriate threadblock-scoped epilogue. | |
file | default_gemm_splitk_parallel.h [code] |
Default kernel-level GEMM definitions combine threadblock-scoped matrix multiply-add with the appropriate threadblock-scoped epilogue. | |
file | default_gemv.h [code] |
file | include/cutlass/gemm/kernel/gemm.h [code] |
Template for a pipelined GEMM kernel. Does not compute batching or support split-K. | |
file | kernel/gemm_batched.h [code] |
Template for a pipelined GEMM kernel. Does not compute batching or support split-K. | |
file | gemm_pipelined.h [code] |
Template for a pipelined GEMM kernel. Does not compute batching or support split-K. | |
file | kernel/gemm_splitk_parallel.h [code] |
Template for GEMM performing a reduction over K partitions in parallel. | |
file | gemv_batched_strided.h [code] |