CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers
default_gemv.h
Go to the documentation of this file.
1 /***************************************************************************************************
2  * Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.
3  *
4  * Redistribution and use in source and binary forms, with or without modification, are permitted
5  * provided that the following conditions are met:
6  * * Redistributions of source code must retain the above copyright notice, this list of
7  * conditions and the following disclaimer.
8  * * Redistributions in binary form must reproduce the above copyright notice, this list of
9  * conditions and the following disclaimer in the documentation and/or other materials
10  * provided with the distribution.
11  * * Neither the name of the NVIDIA CORPORATION nor the names of its contributors may be used
12  * to endorse or promote products derived from this software without specific prior written
13  * permission.
14  *
15  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR
16  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
17  * FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE
18  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
19  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
20  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
21  * STRICT LIABILITY, OR TOR (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
22  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
23  *
24  **************************************************************************************************/
25 
26 #pragma once
27 
31 
32 namespace cutlass {
33 namespace gemm {
34 namespace kernel {
35 
37 
38 template <
40  typename ThreadBlockShape_,
42  typename ThreadShape_,
44  typename ElementA_,
46  typename LayoutA_,
48  typename ElementB_,
50  typename LayoutB_,
52  typename ElementCD_,
54  typename LayoutCD_,
56  typename ElementAccumulator_ = ElementCD_>
57 struct DefaultGemv {
58 
60  using ThreadBlockShape = ThreadBlockShape_;
61 
63  using ThreadShape = ThreadShape_;
64 
66  using ElementA = ElementA_;
67 
69  using LayoutA = LayoutA_;
70 
72  using ElementB = ElementB_;
73 
75  using LayoutB = LayoutB_;
76 
78  using ElementAccumulator = ElementAccumulator_;
79 
81  using LayoutAccumulator = LayoutCD_;
82 
84  using ElementCD = ElementCD_;
85 
87  using LayoutCD = LayoutCD_;
88 
89  // Define the core components
93 
94  // Define the threadblock-scoped gemv
96 
97  // Iterator for multiplicand A
99 
100  // Iterator for multiplicand B
102 
104  using IteratorPolicyCD = typename platform::conditional<
107  layout::PitchLinearShape<ThreadBlockShape::kN, ThreadBlockShape::kM>, Core::kThreadsPerN, ThreadShape::kN>,
109  layout::PitchLinearShape<ThreadBlockShape::kM, ThreadBlockShape::kN>, Core::kThreadsPerN, ThreadShape::kM>>::type;
110 
114 
116  using FragmentCD = typename IteratorCD::Fragment;
117 
118  // Define the threadblock swizzle
120 };
121 
123 
124 } // namespace kernel
125 } // namespace gemm
126 } // namespace cutlass
Describes the size of a matrix tile.
Definition: matrix_shape.h:42
Definition: aligned_buffer.h:35
std::is_same (false specialization)
Definition: platform.h:394
ElementAccumulator_ ElementAccumulator
Data type of accumulators.
Definition: default_gemv.h:78
typename ThreadBlockGemv::IteratorB IteratorB
Definition: default_gemv.h:101
ThreadShape_ ThreadShape
Shape of warp-level matrix operation (concept: GemmShape)
Definition: default_gemv.h:63
LayoutCD_ LayoutCD
Layout of input/output matrix C/D.
Definition: default_gemv.h:87
LayoutCD_ LayoutAccumulator
Data type of accumulators (same as C/D)
Definition: default_gemv.h:81
Template defining a shape used by pitch-linear operators.
Definition: pitch_linear.h:43
ElementA_ ElementA
Data type of multiplicand A.
Definition: default_gemv.h:66
typename IteratorCD::Fragment FragmentCD
Fragment storage for C/D.
Definition: default_gemv.h:116
Structure to compute the matrix-vector product using SIMT math instructions.
Definition: gemv.h:50
LayoutA_ LayoutA
Layout of multiplicand A.
Definition: default_gemv.h:69
Definition: default_gemv_core.h:68
std::conditional (true specialization)
Definition: platform.h:325
Definition: default_gemv.h:57
typename platform::conditional< platform::is_same< LayoutCD, layout::RowMajor >::value, cutlass::transform::PitchLinearTilePolicyStripminedThreadContiguous< layout::PitchLinearShape< ThreadBlockShape::kN, ThreadBlockShape::kM >, Core::kThreadsPerN, ThreadShape::kN >, cutlass::transform::PitchLinearTilePolicyStripminedThreadStrided< layout::PitchLinearShape< ThreadBlockShape::kM, ThreadBlockShape::kN >, Core::kThreadsPerN, ThreadShape::kM >>::type IteratorPolicyCD
Policy for the iterator that reads/writes C/D.
Definition: default_gemv.h:109
typename Core_::IteratorA IteratorA
Iterates over A in global memory.
Definition: gemv.h:58
Definition: transform/threadblock/predicated_tile_iterator.h:133
Defines basic properties needed by CTA-level batched GEMV assuming expectations about data layout of ...
typename Core_::IteratorB IteratorB
Iterates over B in global memory.
Definition: gemv.h:61
Threadblock swizzling function for batched GEMVs.
Definition: gemm/threadblock/threadblock_swizzle.h:296
Template for a threadblock-scoped GEMV kernel.
typename ThreadBlockGemv::IteratorA IteratorA
Definition: default_gemv.h:98
Implements several possible threadblock-swizzling functions mapping blockIdx to GEMM problems...
typename cutlass::gemm::threadblock::DefaultGemvCore< ThreadBlockShape, ThreadShape, ElementA, LayoutA, ElementB, LayoutB, ElementAccumulator, LayoutAccumulator > Core
Definition: default_gemv.h:92
ElementB_ ElementB
Data type of multiplicand B.
Definition: default_gemv.h:72
LayoutB_ LayoutB
Layout of multiplicand B.
Definition: default_gemv.h:75
ThreadBlockShape_ ThreadBlockShape
Shape of Threadblock-level matrix operation (concept: GemmShape)
Definition: default_gemv.h:60
ElementCD_ ElementCD
Data type of input/output matrix C/D.
Definition: default_gemv.h:84