39 namespace threadblock {
45 typename ThreadblockShape_,
47 typename MmaSimtPolicy_,
70 !(ThreadblockShape::kM % WarpShape::kM) &&
71 !(ThreadblockShape::kM % WarpShape::kM),
"Divisibility");
75 ThreadblockShape::kM / WarpShape::kM,
76 ThreadblockShape::kN / WarpShape::kN,
82 WarpShape::kM / (MmaSimtPolicy::WarpShape::kRow * MmaSimtPolicy::LaneMmaShape::kM);
100 MmaSimtPolicy::WarpShape::kRow,
105 MmaSimtPolicy::LaneMmaShape::kM,
static int const kM
Definition: include/cutlass/gemm/gemm.h:58
Definition: output_tile_thread_map.h:228
Definition: aligned_buffer.h:35
ThreadblockShape_ ThreadblockShape
Definition: default_thread_map_simt.h:54
MmaSimtPolicy_ MmaSimtPolicy
Definition: default_thread_map_simt.h:56
Tuple defining point in output tile.
Definition: output_tile_thread_map.h:57
static int const kThreads
Number of participating threads.
Definition: default_thread_map_simt.h:85
Epilogue for threadblock scoped GEMMs using Tensor Ops.
Defines common types used for all GEMM-like operators.
static int const kCount
Definition: include/cutlass/gemm/gemm.h:67
Defines the optimal thread map for SIMT accumulator layouts.
Definition: default_thread_map_simt.h:52
Defines the size of an element in bits.
Definition: numeric_types.h:42
Element_ Element
Definition: default_thread_map_simt.h:58
static int const kElementsPerAccess
Definition: default_thread_map_simt.h:59
static int const kIterations
Number of iterations.
Definition: default_thread_map_simt.h:88
static int const kWarpSize
Definition: default_thread_map_simt.h:67
static int const kPartitionsK
Definition: default_thread_map_simt.h:57
Shape of a matrix multiply-add operation.
Definition: include/cutlass/gemm/gemm.h:57
Definition: default_thread_map_simt.h:65
WarpShape_ WarpShape
Definition: default_thread_map_simt.h:55
static int const kGroupCount
Computes number of thread-level matrix multiplies are needed to span a warp.
Definition: default_thread_map_simt.h:81