59 namespace threadblock {
66 typename WarpMmaTensorOp_,
80 using LayoutC =
typename WarpMmaTensorOp::LayoutC;
89 typename WarpMmaTensorOp::Shape,
101 typename WarpMmaTensorOp::Shape,
102 typename WarpMmaTensorOp::Policy::Operator::Shape,
103 typename WarpMmaTensorOp::Policy::Operator::ElementC,
104 typename WarpMmaTensorOp::Policy::Operator::FragmentC,
109 typename WarpMmaTensorOp::Shape,
110 typename WarpMmaTensorOp::Policy::Operator::Shape,
116 typename OutputTileThreadMap::CompactedThreadMap,
Describes the size of a matrix tile.
Definition: matrix_shape.h:42
Templates implementing loading of tiles from pitch-linear rank=2 tensors.
typename OutputOp::ElementOutput ElementOutput
Definition: default_epilogue_complex_tensor_op.h:79
Definition: aligned_buffer.h:35
static int const kPartitionsK
Definition: default_epilogue_complex_tensor_op.h:75
typename WarpMmaTensorOp::LayoutC LayoutC
Definition: default_epilogue_complex_tensor_op.h:80
typename cutlass::epilogue::threadblock::DefaultThreadMapTensorOp< Shape, typename WarpMmaTensorOp::Shape, kPartitionsK, ElementOutput, kElementsPerAccess >::Type OutputTileThreadMap
Definition: default_epilogue_complex_tensor_op.h:93
Epilogue for threadblock scoped GEMMs using Tensor Ops.
cutlass::epilogue::warp::TileIteratorTensorOp< typename WarpMmaTensorOp::Shape, typename WarpMmaTensorOp::Policy::Operator::Shape, ElementAccumulator, LayoutC > WarpTileIterator
Definition: default_epilogue_complex_tensor_op.h:113
Defines common types used for all GEMM-like operators.
static int const kElementsPerAccess
Definition: default_epilogue_complex_tensor_op.h:77
Functor performing conversion operations used by epilogues.
This defines a "fragment" iterator for visiting the fragments of an accumulator tile that participate...
cutlass::epilogue::threadblock::SharedLoadIterator< typename OutputTileThreadMap::CompactedThreadMap, ElementAccumulator > SharedLoadIterator
Definition: default_epilogue_complex_tensor_op.h:118
Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe ...
Functor performing linear combination operations used by epilogues.
Defines sensible defaults for epilogues for TensorOps.
Definition: default_epilogue_complex_tensor_op.h:71
Shape_ Shape
Definition: default_epilogue_complex_tensor_op.h:73
WarpMmaTensorOp_ WarpMmaTensorOp
Definition: default_epilogue_complex_tensor_op.h:74
Definition: fragment_iterator_complex_tensor_op.h:61
cutlass::epilogue::warp::FragmentIteratorComplexTensorOp< typename WarpMmaTensorOp::Shape, typename WarpMmaTensorOp::Policy::Operator::Shape, typename WarpMmaTensorOp::Policy::Operator::ElementC, typename WarpMmaTensorOp::Policy::Operator::FragmentC, LayoutC > AccumulatorFragmentIterator
Definition: default_epilogue_complex_tensor_op.h:106
Defines the optimal thread map for TensorOp accumulator layouts.
Definition: default_thread_map_tensor_op.h:52
Top-level include for all CUTLASS numeric types.
Template for reading and writing tiles of accumulators to shared memory.
Definition: tile_iterator_tensor_op.h:52
cutlass::epilogue::threadblock::PredicatedTileIterator< OutputTileThreadMap, ElementOutput > OutputTileIterator
Definition: default_epilogue_complex_tensor_op.h:98
Epilogue for threadblock scoped GEMMs using Tensor Ops.
Epilogue operator without splitk.
Definition: epilogue.h:74
Epilogue for threadblock scoped GEMMs using Tensor Ops.
Definition: epilogue/threadblock/predicated_tile_iterator.h:65
typename WarpMmaTensorOp::ElementC ElementAccumulator
Definition: default_epilogue_complex_tensor_op.h:81
OutputOp_ OutputOp
Definition: default_epilogue_complex_tensor_op.h:76
Definition: shared_load_iterator.h:61
Functor performing reduction operations used by epilogues.
Basic include for CUTLASS.