CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
#include <predicated_tile_iterator_2dthreadtile.h>
Classes | |
struct | AccessType |
class | Params |
Parameters object is precomputed state and is host-constructible. More... | |
Public Types | |
using | Shape = Shape_ |
using | Element = Element_ |
using | Layout = layout::PitchLinear |
using | ThreadMap = ThreadMap_ |
using | Index = typename Layout::Index |
using | LongIndex = typename Layout::LongIndex |
using | TensorRef = TensorRef< Element, Layout > |
using | TensorView = TensorView< Element, Layout > |
using | TensorCoord = typename Layout::TensorCoord |
using | Pointer = Element * |
using | NonConstPointer = typename platform::remove_const< Element >::type * |
using | Transform = thread::Transpose< ThreadMap::Iterations::kCount *ThreadMap::ThreadAccessShape::kCount, layout::PitchLinearShape< 4, 4 >, Element > |
Optinally this fragment can be 4x4 transposed. More... | |
using | TileAccessIterator = PredicatedTileAccessIterator2dThreadTile< Shape, Element, Layout, kAdvanceRank, ThreadMap, AccessType > |
Underlying iterator to compute the addresses. More... | |
using | Fragment = cutlass::Array< Element, ThreadMap::Iterations::kCount *ThreadMap::ThreadAccessShape::kCount > |
Fragment object to be loaded or stored. More... | |
using | Mask = typename TileAccessIterator::Mask |
Predicate vector stores mask to guard accesses. More... | |
Public Member Functions | |
CUTLASS_HOST_DEVICE | PredicatedTileIterator2dThreadTile (Params const ¶ms, Pointer pointer, TensorCoord extent, int thread_id, TensorCoord const &threadblock_offset) |
CUTLASS_HOST_DEVICE | PredicatedTileIterator2dThreadTile (Params const ¶ms, Pointer pointer, TensorCoord extent, int thread_id) |
Construct a PredicatedTileIterator2dThreadTile with zero threadblock offset. More... | |
CUTLASS_HOST_DEVICE void | add_pointer_offset (LongIndex pointer_offset) |
Adds a pointer offset in units of Element. More... | |
CUTLASS_HOST_DEVICE PredicatedTileIterator2dThreadTile & | operator++ () |
CUTLASS_HOST_DEVICE PredicatedTileIterator2dThreadTile | operator++ (int) |
CUTLASS_HOST_DEVICE void | clear_mask () |
Clears the predicate set efficiently. More... | |
CUTLASS_HOST_DEVICE void | enable_mask () |
Clears the predicate set efficiently. More... | |
CUTLASS_HOST_DEVICE void | set_mask (Mask const &mask) |
Sets the predicate mask, overriding value stored in predicate iterator. More... | |
CUTLASS_HOST_DEVICE void | get_mask (Mask &mask) |
Gets the mask. More... | |
CUTLASS_DEVICE void | load_with_pointer_offset (Fragment &frag, Index pointer_offset) |
Loads a fragment from memory. More... | |
CUTLASS_DEVICE void | load (Fragment &frag) |
Loads a fragment from memory. More... | |
CUTLASS_DEVICE void | store_with_pointer_offset (Fragment const &frag, Index pointer_offset) |
Store a fragment to memory. More... | |
CUTLASS_DEVICE void | store (Fragment const &frag) |
Store a fragment to memory. More... | |
Static Public Attributes | |
static int const | kAdvanceRank = AdvanceRank |
static bool const | transpose = Transpose_ |
Specialization of PredicatedTileIterator2dThreadTile for pitch-linear data.
Satisfies: ForwardTileIteratorConcept | ReadableContiguousTileIteratorConcept | WriteableContiguousTileIteratorConcept | MaskedTileIteratorConcept
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::Element = Element_ |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::Fragment = cutlass::Array<Element, ThreadMap::Iterations::kCount * ThreadMap::ThreadAccessShape::kCount> |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::Index = typename Layout::Index |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::Layout = layout::PitchLinear |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::LongIndex = typename Layout::LongIndex |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::Mask = typename TileAccessIterator::Mask |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::NonConstPointer = typename platform::remove_const<Element>::type * |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::Pointer = Element * |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::Shape = Shape_ |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::TensorCoord = typename Layout::TensorCoord |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::TensorRef = TensorRef<Element, Layout> |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::TensorView = TensorView<Element, Layout> |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::ThreadMap = ThreadMap_ |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::TileAccessIterator = PredicatedTileAccessIterator2dThreadTile<Shape, Element, Layout, kAdvanceRank, ThreadMap, AccessType> |
using cutlass::transform::threadblock::PredicatedTileIterator2dThreadTile< Shape_, Element_, layout::PitchLinear, AdvanceRank, ThreadMap_, Transpose_ >::Transform = thread::Transpose< ThreadMap::Iterations::kCount * ThreadMap::ThreadAccessShape::kCount , layout::PitchLinearShape<4,4>, Element> |
|
inline |
Constructs a TileIterator from its precomputed state, threadblock offset, and thread ID
params | Precomputed parameters object |
pointer | Pointer to start of tensor |
extent | Extent of tensor |
thread_id | ID of each participating thread |
threadblock_offset | Initial offset of threadblock |
|
inline |
params | Precomputed parameters object |
pointer | Pointer to start of tensor |
extent | Extent of tensor |
thread_id | ID of each participating thread |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
Advances to the next tile in memory.
The first time this method is called, predicates are updated, and the iterator's internal pointer is reverted to the first "steady state" tile. Subsequent calls are lightweight and must only update the internal pointer.
|
inline |
Advances to the next tile in memory.
The first time this method is called, predicates are updated, and the iterator's internal pointer is reverted to the first "steady state" tile. Subsequent calls are lightweight and must only update the internal pointer.
|
inline |
|
inline |
|
inline |
|
static |
|
static |