CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
#include <pitch_linear_thread_map.h>
Classes | |
struct | Detail |
Internal details made public to facilitate introspection Iterations along each dimension (concept: PitchLinearShape) More... | |
Public Types | |
using | ThreadMap = ThreadMap_ |
Underlying ThreadMap. More... | |
using | TensorCoord = typename ThreadMap::TensorCoord |
Tensor coordinate. More... | |
using | Shape = typename ThreadMap::Shape |
Tile shape. More... | |
using | ThreadAccessShape = layout::PitchLinearShape< kElementsPerAccess, 1 > |
Shape of access by each thread. More... | |
using | Iterations = layout::PitchLinearShape< ThreadMap::Iterations::kStrided, ThreadMap::Iterations::kContiguous > |
using | Delta = layout::PitchLinearShape< Detail::WarpThreadArrangement::kContiguous *kElementsPerAccess, Detail::WarpThreadArrangement::kStrided > |
Delta betweeen accesses (units of elements, concept: PitchLinearShape) More... | |
Static Public Member Functions | |
static CUTLASS_HOST_DEVICE TensorCoord | initial_offset (int thread_id) |
Static Public Attributes | |
static int const | kThreads = ThreadMap::kThreads |
Number of threads total. More... | |
static int const | kElementsPerAccess = ThreadMap::kElementsPerAccess |
Extract vector length from Layout. More... | |
Transpose the existing ThreadMap. For example, interleaved layout is like congruous in the global memory and crosswise in the shared memory. We need to transpose the coordinates between two.
using cutlass::transform::TransposePitchLinearThreadMap< ThreadMap_, WarpThreadArrangement_ >::Delta = layout::PitchLinearShape<Detail::WarpThreadArrangement::kContiguous * kElementsPerAccess, Detail::WarpThreadArrangement::kStrided> |
using cutlass::transform::TransposePitchLinearThreadMap< ThreadMap_, WarpThreadArrangement_ >::Iterations = layout::PitchLinearShape<ThreadMap::Iterations::kStrided, ThreadMap::Iterations::kContiguous> |
using cutlass::transform::TransposePitchLinearThreadMap< ThreadMap_, WarpThreadArrangement_ >::Shape = typename ThreadMap::Shape |
using cutlass::transform::TransposePitchLinearThreadMap< ThreadMap_, WarpThreadArrangement_ >::TensorCoord = typename ThreadMap::TensorCoord |
using cutlass::transform::TransposePitchLinearThreadMap< ThreadMap_, WarpThreadArrangement_ >::ThreadAccessShape = layout::PitchLinearShape<kElementsPerAccess, 1> |
using cutlass::transform::TransposePitchLinearThreadMap< ThreadMap_, WarpThreadArrangement_ >::ThreadMap = ThreadMap_ |
|
inlinestatic |
Maps thread ID to a coordinate offset within the tensor's logical coordinate space Note this is slightly different from the one of PitchLinearWarpRakedThreadMap.
|
static |
|
static |