40 namespace threadblock {
46 typename ThreadblockShape_,
48 typename InstructionShape_,
73 !(ThreadblockShape::kM % WarpShape::kM) &&
74 !(ThreadblockShape::kM % WarpShape::kM),
"Divisibility");
78 ThreadblockShape::kM / WarpShape::kM,
79 ThreadblockShape::kN / WarpShape::kN,
Definition: output_tile_thread_map.h:228
Definition: aligned_buffer.h:35
Tuple defining point in output tile.
Definition: output_tile_thread_map.h:57
Definition: default_thread_map_wmma_tensor_op.h:66
Epilogue for threadblock scoped GEMMs using Tensor Ops.
Element_ Element
Definition: default_thread_map_wmma_tensor_op.h:59
Defines common types used for all GEMM-like operators.
static int const kCount
Definition: include/cutlass/gemm/gemm.h:67
static int const kThreads
Number of participating threads.
Definition: default_thread_map_wmma_tensor_op.h:84
static int const kPartitionsK
Definition: default_thread_map_wmma_tensor_op.h:58
Defines the size of an element in bits.
Definition: numeric_types.h:42
InstructionShape_ InstructionShape
Definition: default_thread_map_wmma_tensor_op.h:57
static int const kElementsPerAccess
Definition: default_thread_map_wmma_tensor_op.h:60
Shape of a matrix multiply-add operation.
Definition: include/cutlass/gemm/gemm.h:57
ThreadblockShape_ ThreadblockShape
Definition: default_thread_map_wmma_tensor_op.h:55
Defines the optimal thread map for Wmma TensorOp accumulator layouts.
Definition: default_thread_map_wmma_tensor_op.h:53
WarpShape_ WarpShape
Definition: default_thread_map_wmma_tensor_op.h:56
static int const kWarpSize
Definition: default_thread_map_wmma_tensor_op.h:70
Defines layout functions used by TensorRef and derived classes for pitch-linear memory.
static int const kTensorOpRows
Wmma Tensor Operations fundamentally perform operations on InstructionShape::kM rows.
Definition: default_thread_map_wmma_tensor_op.h:69