CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Launches a kernel calling a functor for each element along a tensor's diagonal.
#include <tensor_foreach.h>
Public Member Functions | |
TensorDiagonalForEach (Coord< Rank > size, Params params=Params(), int start=0, int end=-1, int block_size=128) | |
Constructor performs the operation. More... | |
|
inline |