CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
#include <predicated_tile_iterator.h>
Classes | |
class | Params |
Parameters object is precomputed state and is host-constructible. More... | |
Public Types | |
using | Shape = Shape_ |
using | Element = Element_ |
using | Layout = layout::ColumnMajor |
using | ThreadMap = ThreadMap_ |
using | Index = typename Layout::Index |
using | LongIndex = typename Layout::LongIndex |
using | TensorRef = TensorRef< Element, Layout > |
using | TensorView = TensorView< Element, Layout > |
using | TensorCoord = typename Layout::TensorCoord |
using | Pointer = Element * |
using | NonConstPointer = typename platform::remove_const< Element >::type * |
using | UnderlyingIterator = PredicatedTileIterator< layout::PitchLinearShape< Shape::kRow, Shape::kColumn >, Element, layout::PitchLinear,(kAdvanceRank==0?0:1), ThreadMap, AccessSize > |
using | AccessType = typename UnderlyingIterator::AccessType |
using | Fragment = cutlass::Array< Element, ThreadMap::Iterations::kCount *ThreadMap::kElementsPerAccess > |
Fragment object to be loaded or stored. More... | |
using | Mask = typename UnderlyingIterator::Mask |
Predicate vector stores mask to guard accesses. More... | |
Public Member Functions | |
CUTLASS_HOST_DEVICE | PredicatedTileIterator (Params const ¶ms, Pointer pointer, TensorCoord extent, int thread_id, TensorCoord const &threadblock_offset) |
Constructs a TileIterator from its precomputed state, threadblock offset, and thread ID. More... | |
CUTLASS_HOST_DEVICE | PredicatedTileIterator (Params const ¶ms, Pointer pointer, TensorCoord extent, int thread_id) |
Construct a PredicatedTileIterator with zero threadblock offset. More... | |
CUTLASS_HOST_DEVICE void | add_pointer_offset (LongIndex pointer_offset) |
Adds a pointer offset in units of Element. More... | |
CUTLASS_HOST_DEVICE PredicatedTileIterator & | operator++ () |
CUTLASS_HOST_DEVICE PredicatedTileIterator | operator++ (int) |
CUTLASS_HOST_DEVICE void | clear_mask () |
Clears the predicate set efficiently. More... | |
CUTLASS_HOST_DEVICE void | enable_mask () |
Clears the predicate set efficiently. More... | |
CUTLASS_HOST_DEVICE void | set_mask (Mask const &mask) |
Sets the predicate mask, overriding value stored in predicate iterator. More... | |
CUTLASS_HOST_DEVICE void | get_mask (Mask &mask) |
Gets the mask. More... | |
CUTLASS_DEVICE void | load_with_pointer_offset (Fragment &frag, Index pointer_offset) |
Loads a fragment from memory. More... | |
CUTLASS_DEVICE void | load (Fragment &frag) |
Loads a fragment from memory. More... | |
CUTLASS_DEVICE void | store_with_pointer_offset (Fragment const &frag, Index pointer_offset) |
Store a fragment to memory. More... | |
CUTLASS_DEVICE void | store (Fragment const &frag) |
Store a fragment to memory. More... | |
Static Public Attributes | |
static int const | kAdvanceRank = AdvanceRank |
Specialization of PredicatedTileIterator for pitch-linear data.
Satisfies: ForwardTileIteratorConcept | ReadableContiguousTileIteratorConcept | WriteableContiguousTileIteratorConcept | MaskedTileIteratorConcept
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::AccessType = typename UnderlyingIterator::AccessType |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::Element = Element_ |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::Fragment = cutlass::Array<Element, ThreadMap::Iterations::kCount * ThreadMap::kElementsPerAccess> |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::Index = typename Layout::Index |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::Layout = layout::ColumnMajor |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::LongIndex = typename Layout::LongIndex |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::Mask = typename UnderlyingIterator::Mask |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::NonConstPointer = typename platform::remove_const<Element>::type * |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::Pointer = Element * |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::Shape = Shape_ |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::TensorCoord = typename Layout::TensorCoord |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::TensorRef = TensorRef<Element, Layout> |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::TensorView = TensorView<Element, Layout> |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::ThreadMap = ThreadMap_ |
using cutlass::transform::threadblock::PredicatedTileIterator< Shape_, Element_, layout::ColumnMajor, AdvanceRank, ThreadMap_, AccessSize >::UnderlyingIterator = PredicatedTileIterator< layout::PitchLinearShape<Shape::kRow, Shape::kColumn>, Element, layout::PitchLinear, (kAdvanceRank == 0 ? 0 : 1), ThreadMap, AccessSize > |
|
inline |
params | Precomputed parameters object |
pointer | Pointer to start of tensor |
extent | Extent of tensor |
thread_id | ID of each participating thread |
threadblock_offset | Initial offset of threadblock |
|
inline |
params | Precomputed parameters object |
pointer | Pointer to start of tensor |
extent | Extent of tensor |
thread_id | ID of each participating thread |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
Advances to the next tile in memory.
The first time this method is called, predicates are updated, and the iterator's internal pointer is reverted to the first "steady state" tile. Subsequent calls are lightweight and must only update the internal pointer.
|
inline |
Advances to the next tile in memory.
The first time this method is called, predicates are updated, and the iterator's internal pointer is reverted to the first "steady state" tile. Subsequent calls are lightweight and must only update the internal pointer.
|
inline |
|
inline |
|
inline |
|
static |