CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
Classes | Namespaces
fragment_iterator_tensor_op.h File Reference

This defines a "fragment" iterator for visiting the fragments of an accumulator tile that participate in one warp-level store operation. More...

#include "cutlass/array.h"
#include "cutlass/layout/matrix.h"
#include "cutlass/epilogue/warp/tensor_op_policy.h"
Include dependency graph for fragment_iterator_tensor_op.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  cutlass::epilogue::warp::FragmentIteratorTensorOp< WarpShape, OperatorShape, OperatorElementC, OperatorFragmentC, Layout >
 
class  cutlass::epilogue::warp::FragmentIteratorTensorOp< WarpShape_, OperatorShape_, OperatorElementC_, OperatorFragmentC_, layout::RowMajor >
 Partial specialization for row-major shared memory. More...
 
class  cutlass::epilogue::warp::FragmentIteratorTensorOp< WarpShape_, OperatorShape_, OperatorElementC_, OperatorFragmentC_, layout::ColumnMajorInterleaved< InterleavedK > >
 Dedicated to interleaved layout. More...
 

Namespaces

 cutlass
 
 cutlass::epilogue
 
 cutlass::epilogue::warp
 

Detailed Description

Typically, the accumulator tile is the largest single block of register-backed storage within the kernel. Storing it to memory is best accomplished by partitioning it into smaller tiles and storing these sequentially.

Round trips through shared memory during the Epilogue phase require partitioning, as shared memory capacity is typically insufficient for a threadblock's total accumulator size.