CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
#include <mma_sm60.h>
Public Types | |
using | FragmentA = Array< half_t, Shape::kMK > |
A operand storage. More... | |
using | FragmentB = Array< half_t, Shape::kKN > |
B operand storage. More... | |
using | FragmentC = Array< half_t, Shape::kMN > |
C operand storage. More... | |
Public Member Functions | |
CUTLASS_HOST_DEVICE void | operator() (FragmentC &D, FragmentA const &A, FragmentB const &B, FragmentC const &C) |
Computes a matrix product D = A * B + C. More... | |
using cutlass::gemm::thread::detail::Mma_HFMA2< Shape, layout::RowMajor, layout::RowMajor, layout::RowMajor, true >::FragmentA = Array<half_t, Shape::kMK> |
using cutlass::gemm::thread::detail::Mma_HFMA2< Shape, layout::RowMajor, layout::RowMajor, layout::RowMajor, true >::FragmentB = Array<half_t, Shape::kKN> |
using cutlass::gemm::thread::detail::Mma_HFMA2< Shape, layout::RowMajor, layout::RowMajor, layout::RowMajor, true >::FragmentC = Array<half_t, Shape::kMN> |
|
inline |
Initialize output with input
Use 1x2x1 HFMA2 sequence for bulk of computation