CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Directories | |
directory | arch |
directory | epilogue |
directory | gemm |
directory | layout |
directory | platform |
directory | reduction |
directory | thread |
directory | transform |
directory | util |
Files | |
file | aligned_buffer.h [code] |
AlignedBuffer is a container for trivially copyable elements suitable for use in unions and shared memory. | |
file | array.h [code] |
Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe to use in a union. | |
file | array_subbyte.h [code] |
Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe to use in a union. | |
file | complex.h [code] |
file | coord.h [code] |
A Coord is a coordinate of arbitrary rank into a tensor or matrix. | |
file | core_io.h [code] |
Helpers for printing cutlass/core objects. | |
file | cutlass.h [code] |
Basic include for CUTLASS. | |
file | device_kernel.h [code] |
Template for generic CUTLASS kernel. | |
file | fast_math.h [code] |
Math utilities. | |
file | functional.h [code] |
Define basic numeric operators with specializations for Array<T, N>. SIMD-ize where possible. | |
file | half.h [code] |
Defines a class for using IEEE half-precision floating-point types in host or device code. | |
file | integer_subbyte.h [code] |
Defines a class for using integer types smaller than one byte in host or device code. | |
file | kernel_launch.h [code] |
Defines structures and helpers to launch CUDA kernels within CUTLASS. | |
file | matrix_coord.h [code] |
Defines a canonical coordinate for rank=2 matrices offering named indices. | |
file | matrix_shape.h [code] |
Defines a Shape template for matrix tiles. | |
file | matrix_traits.h [code] |
Defines properties of matrices used to denote layout and operands to GEMM kernels. | |
file | numeric_conversion.h [code] |
Boost-like numeric conversion operator for CUTLASS numeric types. | |
file | numeric_types.h [code] |
Top-level include for all CUTLASS numeric types. | |
file | predicate_vector.h [code] |
Defines container classes and iterators for managing a statically sized vector of boolean predicates. | |
file | real.h [code] |
file | relatively_equal.h [code] |
file | semaphore.h [code] |
Implementation of a CTA-wide semaphore for inter-CTA synchronization. | |
file | subbyte_reference.h [code] |
Provides a mechanism for packing and unpacking elements smaller than one byte. | |
file | tensor_coord.h [code] |
Defines a canonical coordinate for rank=4 tensors offering named indices. | |
file | tensor_ref.h [code] |
Defines a structure containing strides, bounds, and a pointer to tensor data. | |
file | tensor_view.h [code] |
Defines a structure containing strides and a pointer to tensor data. | |
file | wmma_array.h [code] |
Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe to use in a union. | |