Script.mbarrier.arrive_and_expect_tx_multicast¶
- Script.mbarrier.arrive_and_expect_tx_multicast(barrier, transaction_bytes, multicast_mask, sem='release', scope='cluster')[source]¶
Arrive at barriers across multiple CTAs with expected async transactions.
Unlike
arrive()andarrive_and_expect_tx()where every thread in the group arrives on the same barrier, this instruction elects one thread per target CTA inmulticast_mask. Each elected thread arrives on the barrier at the same shared memory offset in its assigned CTA. The arrival count is 1 and the tx-count is increased bytransaction_byteson each signaled barrier.- Parameters:
barrier (RegisterTensor) – A single-element uint32 register tensor with the barrier’s shared memory address in the current CTA. The same offset is used for peer CTAs.
transaction_bytes (Expr | int) – Expected async transfer size in bytes. Must evaluate to a non-negative int32.
multicast_mask (int) – Bitmask of CTAs to signal. Bit i corresponds to the CTA with rank i. E.g.,
0b101signals CTAs 0 and 2.sem (str) – Memory ordering semantics. Candidates:
'relaxed','release'.scope (str) – Synchronization scope. Candidates:
'cta','cluster'.
- Return type:
None
Notes
Thread group: Must be executed by a thread group with at least 16 threads.
Hardware: Requires compute capability 9.0+ (sm_90).
PTX:
mbarrier.arrive.expect_tx.shared::cluster.b64withmapa.shared::cluster