Script.wgmma.mma¶

Script.wgmma.mma(a, b, d)[source]¶

Perform warp group matrix multiply-accumulate (MMA) operation.

Computes d = a @ b + d where a is in shared or register memory, b is in shared memory, and d is in register memory (both input accumulator and output).

All tensors must be 2D with compatible shapes: a is [M, K], b is [K, N], and d is [M, N].

A wgmma.fence() must be called before this instruction, and a wgmma.commit_group() followed by wgmma.wait_group() after.

Parameters:

a (SharedTensor | RegisterTensor) – The left-hand operand of the matrix multiplication. Shape [M, K].
b (SharedTensor) – The right-hand operand of the matrix multiplication. Shape [K, N].
d (RegisterTensor) – The accumulator tensor, used as both input and output. Shape [M, N].

Return type:

None

Notes

Thread group: Must be executed by a warp group (4 warps).
Hardware: Requires compute capability 9.0a+ (sm_90a).
PTX: wgmma.mma_async.sync.aligned

Script.wgmma.mma

Contents

Script.wgmma.mma¶