Script.mbarrier.arrive_and_expect_tx¶
- Script.mbarrier.arrive_and_expect_tx(barrier, transaction_bytes, sem='release', scope='cta')[source]¶
Arrive at a barrier and declare expected asynchronous transaction bytes.
Each thread in the current thread group performs two updates on the barrier:
Decrements the pending arrival count by 1.
Increases the pending tx-count by
transaction_bytes.
The tx-count tracks asynchronous data transfers (e.g., TMA copies). When an async operation tied to this barrier completes, the hardware automatically decrements the tx-count by the number of bytes transferred. The phase completes only when both pending arrivals and tx-count reach zero.
Typically used with
single_thread()so that only one thread sets the tx-count expectation, while the TMA engine performs the actual transfer.- Parameters:
barrier (RegisterTensor) – A single-element uint32 register tensor holding the barrier’s shared memory address.
transaction_bytes (Expr | int) – The number of bytes expected from async transactions (e.g., TMA copies). The barrier’s tx-count is increased by this value; it is automatically decreased by the hardware as the transactions complete. Must evaluate to a non-negative int32.
sem (str) – Memory ordering semantics. Candidates:
'relaxed','release'.scope (str) – Synchronization scope. Candidates:
'cta','cluster'.
- Return type:
None
Notes
Thread group: Can be executed by any sized thread group.
Hardware: Requires compute capability 9.0+ (sm_90).
PTX:
mbarrier.arrive.expect_tx.shared::cta.b64