CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
|
#include <conversion_op.h>
Classes | |
struct | Params |
Host-constructable parameters structure. More... | |
Public Types | |
using | ElementOutput = ElementOutput_ |
using | ElementAccumulator = ElementAccumulator_ |
using | ElementCompute = ElementAccumulator_ |
using | FragmentOutput = Array< ElementOutput, kCount > |
using | FragmentAccumulator = Array< ElementAccumulator, kCount > |
using | ComputeFragment = FragmentAccumulator |
Public Member Functions | |
CUTLASS_HOST_DEVICE | Convert (Params const ¶ms=Params()) |
Constructs the function object, possibly loading from pointers in host memory. More... | |
CUTLASS_HOST_DEVICE constexpr bool | is_source_needed () const |
Returns true if source is needed based on state of runtime arguments. More... | |
CUTLASS_HOST_DEVICE constexpr bool | is_source_ever_needed () const |
CUTLASS_HOST_DEVICE FragmentOutput | operator() (FragmentAccumulator const &accumulator, FragmentOutput const &source, ElementCompute uniform=ElementCompute(0)) const |
Computes linear scaling: D = alpha * accumulator + beta * source. More... | |
Static Public Attributes | |
static int const | kCount = Count |
static FloatRoundStyle const | kRound = Round |
Converts the result without other operations
using cutlass::epilogue::thread::Convert< ElementOutput_, Count, ElementAccumulator_, Round >::ComputeFragment = FragmentAccumulator |
using cutlass::epilogue::thread::Convert< ElementOutput_, Count, ElementAccumulator_, Round >::ElementAccumulator = ElementAccumulator_ |
using cutlass::epilogue::thread::Convert< ElementOutput_, Count, ElementAccumulator_, Round >::ElementCompute = ElementAccumulator_ |
using cutlass::epilogue::thread::Convert< ElementOutput_, Count, ElementAccumulator_, Round >::ElementOutput = ElementOutput_ |
using cutlass::epilogue::thread::Convert< ElementOutput_, Count, ElementAccumulator_, Round >::FragmentAccumulator = Array<ElementAccumulator, kCount> |
using cutlass::epilogue::thread::Convert< ElementOutput_, Count, ElementAccumulator_, Round >::FragmentOutput = Array<ElementOutput, kCount> |
|
inline |
|
inline |
Constexpr function to enable the compiler to optimize away the source loading if it is never needed.
|
inline |
|
inline |
|
static |
|
static |