Script.wgmma¶
Warp Group Matrix Multiply-Accumulate (WGMMA) instructions for Hopper GPUs.
WGMMA performs asynchronous matrix multiply-accumulate operations using a warp group
(4 consecutive warps, 128 threads). The operands reside in shared memory (a, b) or
registers (a), and the accumulator (d) is in registers.
WGMMA operations are asynchronous and must follow a strict execution protocol:
fence()— establish memory ordering so prior writes to operands are visible.mma()— issue one or more async MMA operations (can be called multiple times).commit_group()— group all pending MMAs into a commit group.wait_group(n)— wait until at mostncommit groups remain pending.
Multiple commit groups can be in flight simultaneously for latency hiding in pipelined loops. For example, issue new MMAs while waiting for a previous group to complete.
All WGMMA instructions must be executed by a full warp group (4 warps). Use
self.warp_group() to create the appropriate thread group context.
Instructions
|
Issue a warp group MMA fence. |
Commit the previously issued warp group MMA operations. |
|
|
Wait for warp group MMA commit groups to complete. |
|
Perform warp group matrix multiply-accumulate (MMA) operation. |