CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
#include <predicated_tile_access_iterator.h>
Classes | |
class | Params |
Parameters object is precomputed state and is host-constructible. More... | |
Public Types | |
using | Shape = Shape_ |
using | Element = Element_ |
using | Layout = layout::ColumnMajorInterleaved< kInterleavedK > |
using | ThreadMap = ThreadMap_ |
using | AccessType = AccessType_ |
using | Index = typename Layout::Index |
using | LongIndex = typename Layout::LongIndex |
using | TensorRef = TensorRef< Element, Layout > |
using | TensorView = TensorView< Element, Layout > |
using | TensorCoord = typename Layout::TensorCoord |
using | Pointer = Element * |
using | NonConstPointer = typename platform::remove_const< Element >::type * |
using | UnderlyingIterator = PredicatedTileAccessIterator< layout::PitchLinearShape< Shape::kRow *kInterleavedK, Shape::kColumn/kInterleavedK >, Element, layout::PitchLinear,(kAdvanceRank==0?0:1), ThreadMap, AccessType > |
using | Mask = typename UnderlyingIterator::Mask |
Predicate vector stores mask to guard accesses. More... | |
Public Member Functions | |
CUTLASS_HOST_DEVICE | PredicatedTileAccessIterator (Params const ¶ms, Pointer pointer, TensorCoord extent, int thread_id, TensorCoord const &threadblock_offset) |
CUTLASS_HOST_DEVICE | PredicatedTileAccessIterator (Params const ¶ms, Pointer pointer, TensorCoord extent, int thread_id) |
Construct a PredicatedTileAccessIterator with zero threadblock offset. More... | |
CUTLASS_HOST_DEVICE void | set_iteration_index (int index) |
Overrides the internal iteration index. More... | |
CUTLASS_HOST_DEVICE void | add_pointer_offset (LongIndex pointer_offset) |
Adds a pointer offset in units of Element. More... | |
CUTLASS_HOST_DEVICE void | add_tile_offset (TensorCoord const &tile_offset) |
CUTLASS_HOST_DEVICE AccessType * | get () const |
Returns a pointer. More... | |
CUTLASS_HOST_DEVICE PredicatedTileAccessIterator & | operator++ () |
CUTLASS_HOST_DEVICE PredicatedTileAccessIterator | operator++ (int) |
CUTLASS_HOST_DEVICE void | clear_mask () |
Clears the predicate set efficiently. More... | |
CUTLASS_HOST_DEVICE void | enable_mask () |
Clears the predicate set efficiently. More... | |
CUTLASS_HOST_DEVICE void | set_mask (Mask const &mask) |
Sets the predicate mask, overriding value stored in predicate iterator. More... | |
CUTLASS_HOST_DEVICE void | get_mask (Mask &mask) |
Gets the mask. More... | |
CUTLASS_HOST_DEVICE bool | valid () |
Returns whether access is valid or not. More... | |
Static Public Attributes | |
static int const | kInterleavedK = InterleavedK |
static int const | kAdvanceRank = AdvanceRank |
static int const | kAccessesPerVector = UnderlyingIterator::kAccessesPerVector |
Specialization of PredicatedTileAccessIterator for interleaved-32 data. It is mapped to the congruous layout.
Satisfies: ForwardTileIteratorConcept | ReadableContiguousTileIteratorConcept | WriteableContiguousTileIteratorConcept | MaskedTileIteratorConcept
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::AccessType = AccessType_ |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::Element = Element_ |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::Index = typename Layout::Index |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::Layout = layout::ColumnMajorInterleaved<kInterleavedK> |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::LongIndex = typename Layout::LongIndex |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::Mask = typename UnderlyingIterator::Mask |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::NonConstPointer = typename platform::remove_const<Element>::type * |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::Pointer = Element * |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::Shape = Shape_ |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::TensorCoord = typename Layout::TensorCoord |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::TensorRef = TensorRef<Element, Layout> |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::TensorView = TensorView<Element, Layout> |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::ThreadMap = ThreadMap_ |
using cutlass::transform::threadblock::PredicatedTileAccessIterator< Shape_, Element_, layout::ColumnMajorInterleaved< InterleavedK >, AdvanceRank, ThreadMap_, AccessType_ >::UnderlyingIterator = PredicatedTileAccessIterator< layout::PitchLinearShape<Shape::kRow * kInterleavedK, Shape::kColumn / kInterleavedK>, Element, layout::PitchLinear, (kAdvanceRank == 0 ? 0 : 1), ThreadMap, AccessType> |
|
inline |
Constructs a TileIterator from its precomputed state, threadblock offset, and thread ID
params | Precomputed parameters object |
pointer | Pointer to start of tensor |
extent | Extent of tensor |
thread_id | ID of each participating thread |
threadblock_offset | Initial offset of threadblock |
|
inline |
params | Precomputed parameters object |
pointer | Pointer to start of tensor |
extent | Extent of tensor |
thread_id | ID of each participating thread |
|
inline |
|
inline |
Advances an iterator along logical dimensions of matrix in units of whole tiles
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
Advances to the next tile in memory.
The first time this method is called, predicates are updated, and the iterator's internal pointer is reverted to the first "steady state" tile. Subsequent calls are lightweight and must only update the internal pointer.
|
inline |
Advances to the next tile in memory.
The first time this method is called, predicates are updated, and the iterator's internal pointer is reverted to the first "steady state" tile. Subsequent calls are lightweight and must only update the internal pointer.
|
inline |
|
inline |
|
inline |
|
static |
|
static |
|
static |