59 namespace threadblock {
66 typename WarpMmaSimt_,
76 static const int kPartitionsK = Shape::kK / WarpMmaSimt::Shape::kK;
79 using LayoutC =
typename WarpMmaSimt::LayoutC;
88 typename WarpMmaSimt::Shape,
89 typename WarpMmaSimt::Policy,
101 typename WarpMmaSimt::Shape,
102 typename WarpMmaSimt::ThreadMma,
104 typename WarpMmaSimt::Policy
108 typename WarpMmaSimt::Shape,
109 typename WarpMmaSimt::ThreadMma,
112 typename WarpMmaSimt::Policy
116 typename OutputTileThreadMap::CompactedThreadMap,
121 using Padding =
typename WarpTileIterator::Padding;
Templates implementing loading of tiles from pitch-linear rank=2 tensors.
Definition: aligned_buffer.h:35
Defines sensible defaults for epilogues for SimtOps.
Definition: default_epilogue_simt.h:70
static int const kElementsPerAccess
Definition: default_epilogue_simt.h:75
Epilogue for threadblock scoped GEMMs using Tensor Ops.
static const int kPartitionsK
Definition: default_epilogue_simt.h:76
Defines common types used for all GEMM-like operators.
Functor performing conversion operations used by epilogues.
cutlass::epilogue::threadblock::SharedLoadIterator< typename OutputTileThreadMap::CompactedThreadMap, ElementAccumulator > SharedLoadIterator
Definition: default_epilogue_simt.h:118
WarpMmaSimt_ WarpMmaSimt
Definition: default_epilogue_simt.h:73
Defines the optimal thread map for SIMT accumulator layouts.
Definition: default_thread_map_simt.h:52
Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe ...
OutputOp_ OutputOp
Definition: default_epilogue_simt.h:74
Functor performing linear combination operations used by epilogues.
cutlass::epilogue::warp::FragmentIteratorSimt< typename WarpMmaSimt::Shape, typename WarpMmaSimt::ThreadMma, layout::RowMajor, typename WarpMmaSimt::Policy > AccumulatorFragmentIterator
Definition: default_epilogue_simt.h:105
typename WarpMmaSimt::LayoutC LayoutC
Definition: default_epilogue_simt.h:79
Fragment iterator for SIMT accumulator arrangements.
Definition: fragment_iterator_simt.h:60
typename WarpMmaSimt::ElementC ElementAccumulator
Definition: default_epilogue_simt.h:80
Top-level include for all CUTLASS numeric types.
This defines a "fragment" iterator for visiting the fragments of an accumulator tile that participate...
Epilogue for threadblock scoped GEMMs using Tensor Ops.
typename WarpTileIterator::Padding Padding
Hard-coded padding elements added.
Definition: default_epilogue_simt.h:121
Shape_ Shape
Definition: default_epilogue_simt.h:72
Mapping function for row-major matrices.
Definition: layout/matrix.h:50
Epilogue operator without splitk.
Definition: epilogue.h:74
Epilogue for threadblock scoped GEMMs using Tensor Ops.
Definition: epilogue/threadblock/predicated_tile_iterator.h:65
typename OutputOp::ElementOutput ElementOutput
Definition: default_epilogue_simt.h:78
cutlass::epilogue::warp::TileIteratorSimt< typename WarpMmaSimt::Shape, typename WarpMmaSimt::ThreadMma, ElementAccumulator, layout::RowMajor, typename WarpMmaSimt::Policy > WarpTileIterator
Definition: default_epilogue_simt.h:113
Definition: shared_load_iterator.h:61
cutlass::epilogue::threadblock::PredicatedTileIterator< OutputTileThreadMap, ElementOutput > OutputTileIterator
Definition: default_epilogue_simt.h:98
Functor performing reduction operations used by epilogues.
Basic include for CUTLASS.
Template for reading and writing tiles of accumulators to shared memory.
Definition: tile_iterator_simt.h:55
typename cutlass::epilogue::threadblock::DefaultThreadMapSimt< Shape, typename WarpMmaSimt::Shape, typename WarpMmaSimt::Policy, kPartitionsK, ElementOutput, kElementsPerAccess >::Type OutputTileThreadMap
Definition: default_epilogue_simt.h:93