CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
#include <pitch_linear_thread_map.h>
Classes | |
struct | Detail |
Internal implementation details. More... | |
Public Types | |
using | TensorCoord = layout::PitchLinearCoord |
Tensor coordinate. More... | |
using | Shape = Shape_ |
Tile shape. More... | |
using | ThreadAccessShape = cutlass::layout::PitchLinearShape< 4, 4 > |
Access Shape of each thread. More... | |
using | Iterations = typename platform::conditional< Threads >=Detail::ShapeVec::kContiguous, layout::PitchLinearShape< 1,(Threads >=Detail::ShapeVec::kContiguous?Detail::ShapeVec::kStrided/(kThreads/Detail::ShapeVec::kContiguous):0) >, layout::PitchLinearShape< Detail::ShapeVec::kContiguous/kThreads, Detail::ShapeVec::kStrided > >::type |
Number of iterations by each thread. More... | |
using | Delta = typename platform::conditional< Threads >=Detail::ShapeVec::kContiguous, layout::PitchLinearShape< Shape::kContiguous, kThreads *ThreadAccessShape::kStrided/Detail::ShapeVec::kContiguous >, layout::PitchLinearShape< kThreads *ThreadAccessShape::kContiguous, 1 > >::type |
Static Public Member Functions | |
static CUTLASS_HOST_DEVICE TensorCoord | initial_offset (int thread_id) |
Static Public Attributes | |
static int const | kThreads = Threads |
Number of threads total. More... | |
static int const | kElementsPerAccess = ThreadAccessShape::kContiguous |
Extract length of each access from Layout. More... | |
using cutlass::transform::PitchLinear2DThreadTileStripminedThreadMap< Shape_, Threads, cutlass::layout::PitchLinearShape< 4, 4 > >::Delta = typename platform::conditional< Threads >= Detail::ShapeVec::kContiguous, layout::PitchLinearShape< Shape::kContiguous, kThreads * ThreadAccessShape::kStrided / Detail::ShapeVec::kContiguous >, layout::PitchLinearShape< kThreads * ThreadAccessShape::kContiguous, 1 > >::type |
Interval between accesses along each dimension of the tensor's logical coordinate space (in units of Elements)
using cutlass::transform::PitchLinear2DThreadTileStripminedThreadMap< Shape_, Threads, cutlass::layout::PitchLinearShape< 4, 4 > >::Iterations = typename platform::conditional< Threads >= Detail::ShapeVec::kContiguous, layout::PitchLinearShape< 1, (Threads >= Detail::ShapeVec::kContiguous ? Detail::ShapeVec::kStrided / (kThreads / Detail::ShapeVec::kContiguous) : 0) >, layout::PitchLinearShape< Detail::ShapeVec::kContiguous / kThreads, Detail::ShapeVec::kStrided > >::type |
using cutlass::transform::PitchLinear2DThreadTileStripminedThreadMap< Shape_, Threads, cutlass::layout::PitchLinearShape< 4, 4 > >::Shape = Shape_ |
using cutlass::transform::PitchLinear2DThreadTileStripminedThreadMap< Shape_, Threads, cutlass::layout::PitchLinearShape< 4, 4 > >::TensorCoord = layout::PitchLinearCoord |
using cutlass::transform::PitchLinear2DThreadTileStripminedThreadMap< Shape_, Threads, cutlass::layout::PitchLinearShape< 4, 4 > >::ThreadAccessShape = cutlass::layout::PitchLinearShape<4, 4> |
|
inlinestatic |
Maps thread ID to a coordinate offset within the tensor's logical coordinate space (in units of Elements)
|
static |
|
static |