CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Templates implementing loading of tiles from pitch-linear rank=2 tensors. More...
#include "cutlass/cutlass.h"
#include "cutlass/array.h"
#include "cutlass/matrix_coord.h"
#include "cutlass/tensor_ref.h"
#include "cutlass/layout/pitch_linear.h"
#include "cutlass/layout/tensor_op_multiplicand_sm70.h"
#include "cutlass/transform/threadblock/regular_tile_iterator.h"
Go to the source code of this file.
Namespaces | |
cutlass | |
cutlass::transform | |
cutlass::transform::threadblock | |
This iterator uses masks to guard out-of-bounds accesses and visits the last "residue" tile first, with the objective of minimizing predicate mask updates during steady-state operation.
A precomputed "Params" object minimizes the amount of state that must be stored in registers, and integer addition is used to advance the pointer through memory.