CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
#include <output_tile_thread_map.h>
Classes | |
struct | CompactedThreadMap |
Compacted thread map in which the 4D region is contiguous. More... | |
struct | Detail |
Public Types | |
using | Shape = Shape_ |
using | Count = Count_ |
using | Iterations = OutputTileShape< Detail::RowArrangement::kIterationsColumn, Detail::RowArrangement::kIterationsRow, Detail::kIterationsGroup, Detail::kIterationsCluster, 1 > |
using | Delta = OutputTileShape< Detail::RowArrangement::kDeltaColumn, Detail::RowArrangement::kDeltaRow, Detail::kDeltaGroup, Detail::kDeltaCluster, 1 > |
Static Public Member Functions | |
static CUTLASS_HOST_DEVICE MatrixCoord | initial_offset (int thread_idx) |
Initial offset function. More... | |
Static Public Attributes | |
static int const | kWarpSize = 32 |
static int const | kThreads = Threads |
static int const | kWarpCount = kThreads / kWarpSize |
static int const | kElementsPerAccess = ElementsPerAccess |
static int const | kElementSize = ElementSize |
Template metaprogram for partitioning a 4D space across warps to achieve several performance objectives:
using cutlass::epilogue::threadblock::OutputTileOptimalThreadMap< Shape_, Count_, Threads, ElementsPerAccess, ElementSize >::Count = Count_ |
using cutlass::epilogue::threadblock::OutputTileOptimalThreadMap< Shape_, Count_, Threads, ElementsPerAccess, ElementSize >::Delta = OutputTileShape< Detail::RowArrangement::kDeltaColumn, Detail::RowArrangement::kDeltaRow, Detail::kDeltaGroup, Detail::kDeltaCluster, 1> |
using cutlass::epilogue::threadblock::OutputTileOptimalThreadMap< Shape_, Count_, Threads, ElementsPerAccess, ElementSize >::Iterations = OutputTileShape< Detail::RowArrangement::kIterationsColumn, Detail::RowArrangement::kIterationsRow, Detail::kIterationsGroup, Detail::kIterationsCluster, 1> |
using cutlass::epilogue::threadblock::OutputTileOptimalThreadMap< Shape_, Count_, Threads, ElementsPerAccess, ElementSize >::Shape = Shape_ |
|
inlinestatic |
|
static |
|
static |
|
static |
|
static |
|
static |