Script.wgmma.mma

Script.wgmma.mma

Script.wgmma.mma(a, b, d)[source]

Perform warp group matrix multiply-accumulate (MMA) operation.

Computes d = a @ b + d where a is in shared or register memory, b is in shared memory, and d is in register memory (both input accumulator and output).

All tensors must be 2D with compatible shapes: a is [M, K], b is [K, N], and d is [M, N].

A wgmma.fence() must be called before this instruction, and a wgmma.commit_group() followed by wgmma.wait_group() after.

Parameters:
  • a (SharedTensor | RegisterTensor) – The left-hand operand of the matrix multiplication. Shape [M, K].

  • b (SharedTensor) – The right-hand operand of the matrix multiplication. Shape [K, N].

  • d (RegisterTensor) – The accumulator tensor, used as both input and output. Shape [M, N].

Return type:

None

Notes

  • Thread group: Must be executed by a warp group (4 warps).

  • Hardware: Requires compute capability 9.0a+ (sm_90a).

  • PTX: wgmma.mma_async.sync.aligned