CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Functions | |
template<typename Fragment > | |
CUTLASS_DEVICE void | dump_fragment (Fragment const &frag, int N=0, int M=0, int S=1) |
template<typename Element > | |
CUTLASS_DEVICE void | dump_shmem (Element const *ptr, size_t size, int S=1) |
CUTLASS_DEVICE void cutlass::debug::dump_fragment | ( | Fragment const & | frag, |
int | N = 0 , |
||
int | M = 0 , |
||
int | S = 1 |
||
) |
The first N threads dump the first M elements from their fragments with a stride of S elements. If N is not specified, dump the data of all the threads. If M is not specified, dump all the elements of the fragment.
CUTLASS_DEVICE void cutlass::debug::dump_shmem | ( | Element const * | ptr, |
size_t | size, | ||
int | S = 1 |
||
) |
Dump the shared memory contents. ptr is the begin address, size specifies the number of elements that need to be dumped, and S specifies the stride.