Script.wgmma

Script.wgmma

Warp Group Matrix Multiply-Accumulate (WGMMA) instructions for Hopper GPUs.

WGMMA performs asynchronous matrix multiply-accumulate operations using a warp group (4 consecutive warps, 128 threads). The operands reside in shared memory (a, b) or registers (a), and the accumulator (d) is in registers.

WGMMA operations are asynchronous and must follow a strict execution protocol:

  1. fence() — establish memory ordering so prior writes to operands are visible.

  2. mma() — issue one or more async MMA operations (can be called multiple times).

  3. commit_group() — group all pending MMAs into a commit group.

  4. wait_group(n) — wait until at most n commit groups remain pending.

Multiple commit groups can be in flight simultaneously for latency hiding in pipelined loops. For example, issue new MMAs while waiting for a previous group to complete.

All WGMMA instructions must be executed by a full warp group (4 warps). Use self.warp_group() to create the appropriate thread group context.

Instructions

fence()

Issue a warp group MMA fence.

commit_group()

Commit the previously issued warp group MMA operations.

wait_group(n)

Wait for warp group MMA commit groups to complete.

mma(a, b, d)

Perform warp group matrix multiply-accumulate (MMA) operation.