122 reinterpret_cast<ElementA const *>(&A), LayoutA::packed({Shape::kM, Shape::kK}));
125 reinterpret_cast<ElementB const *>(&B), LayoutB::packed({Shape::kK, Shape::kN}));
128 reinterpret_cast<ElementC *>(&D), LayoutC::packed({ Shape::kM, Shape::kN }));
137 for (
int k = 0; k < Shape::kK; ++k) {
140 for (
int n = 0; n < Shape::kN; ++n) {
143 for (
int m = 0; m < Shape::kM; ++m) {
145 int m_serpentine = (n % 2) ? (Shape::kM - 1 - m) : m;
151 Array<ElementC, 1> d;
152 Array<ElementA, 1> a;
153 Array<ElementB, 1> b;
Operator_ Operator
Underlying mathematical operator.
Definition: gemm/thread/mma_sm50.h:89
Definition: aligned_buffer.h:35
Array< ElementB, Shape::kKN > FragmentB
B operand storage.
Definition: gemm/thread/mma_sm50.h:95
Defines a structure containing strides, bounds, and a pointer to tensor data.
ElementA_ ElementA
Data type of operand A.
Definition: gemm/thread/mma_sm50.h:203
ElementB_ ElementB
Data type of operand B.
Definition: gemm/thread/mma_sm50.h:209
Array< ElementC, Shape::kMN > FragmentC
C operand storage.
Definition: gemm/thread/mma_sm50.h:98
CUTLASS_HOST_DEVICE void operator()(FragmentC &D, FragmentA const &A, FragmentB const &B, FragmentC const &C)
Computes a matrix product D = A * B + C.
Definition: gemm/thread/mma_sm50.h:238
LayoutA_ LayoutA
Layout of A matrix (concept: layout::MapFunc)
Definition: gemm/thread/mma_sm50.h:74
Array< ElementC, Shape::kMN > FragmentC
C operand storage.
Definition: gemm/thread/mma_sm50.h:230
LayoutC_ LayoutC
Layout of C matrix (concept: layout::MapFunc)
Definition: gemm/thread/mma_sm50.h:86
Defines common types used for all GEMM-like operators.
ElementC_ ElementC
Element type of operand C.
Definition: gemm/thread/mma_sm50.h:215
ElementA_ ElementA
Data type of operand A.
Definition: gemm/thread/mma_sm50.h:71
LayoutB_ LayoutB
Layout of B matrix (concept: layout::MapFunc)
Definition: gemm/thread/mma_sm50.h:80
#define CUTLASS_PRAGMA_UNROLL
Definition: cutlass.h:110
Templates exposing architecture support for multiply-add operations.
LayoutB_ LayoutB
Layout of B matrix (concept: layout::MapFunc)
Definition: gemm/thread/mma_sm50.h:212
Gemplate that handles all packed matrix layouts.
Definition: gemm/thread/mma_sm50.h:65
Array< ElementA, Shape::kMK > FragmentA
A operand storage.
Definition: gemm/thread/mma_sm50.h:224
Array< ElementA, Shape::kMK > FragmentA
A operand storage.
Definition: gemm/thread/mma_sm50.h:92
arch::OpMultiplyAdd Operator
Underlying mathematical operator.
Definition: gemm/thread/mma_sm50.h:221
#define CUTLASS_HOST_DEVICE
Definition: cutlass.h:89
Shape_ Shape
Size of the Gemm problem - concept: gemm::GemmShape<>
Definition: gemm/thread/mma_sm50.h:68
Templates exposing architecture support for warp-level multiply-add operations.
Shape of a matrix multiply-add operation.
Definition: include/cutlass/gemm/gemm.h:57
Array< ElementB, Shape::kKN > FragmentB
B operand storage.
Definition: gemm/thread/mma_sm50.h:227
Shape_ Shape
Size of the Gemm problem - concept: gemm::GemmShape<>
Definition: gemm/thread/mma_sm50.h:200
CUTLASS_HOST_DEVICE void operator()(FragmentC &D, FragmentA const &A, FragmentB const &B, FragmentC const &C)
Computes a matrix product D = A * B + C.
Definition: gemm/thread/mma_sm50.h:115
CUTLASS_HOST_DEVICE Reference at(TensorCoord const &coord) const
Returns a reference to the element at a given Coord.
Definition: tensor_ref.h:307
Structure to compute the matrix product.
Definition: gemm/thread/mma.h:66
Defines layout functions used by TensorRef and derived classes.
LayoutA_ LayoutA
Layout of A matrix (concept: layout::MapFunc)
Definition: gemm/thread/mma_sm50.h:206
Matrix multiply-add operation.
Definition: arch/mma.h:92
LayoutC_ LayoutC
Layout of C matrix (concept: layout::MapFunc)
Definition: gemm/thread/mma_sm50.h:218
Basic include for CUTLASS.
Definition: matrix_coord.h:39
ElementB_ ElementB
Data type of operand B.
Definition: gemm/thread/mma_sm50.h:77
ElementC_ ElementC
Element type of operand C.
Definition: gemm/thread/mma_sm50.h:83
cutlass::arch::Mma< gemm::GemmShape< 1, 1, 1 >, 1, ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, Operator > Matrix multiply-add operation - specialized for 1x1x1x1 matrix multiply operation.
Definition: arch/mma.h:113