CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers

Templates calculating the address and predicates to the load of tiles from pitchlinear rank=2 tensors. More...
#include "cutlass/array.h"
#include "cutlass/coord.h"
#include "cutlass/cutlass.h"
#include "cutlass/layout/matrix.h"
#include "cutlass/layout/pitch_linear.h"
#include "cutlass/matrix_shape.h"
#include "cutlass/predicate_vector.h"
#include "cutlass/tensor_ref.h"
#include "cutlass/tensor_view.h"
Namespaces  
cutlass  
cutlass::transform  
cutlass::transform::threadblock  
This iterator uses masks to guard outofbounds accesses and visits the last "residue" tile first, with the objective of minimizing predicate mask updates during steadystate operation.
A precomputed "Params" object minimizes the amount of state that must be stored in registers, and integer addition is used to advance the pointer through memory.