Script.mbarrier.arrive_and_expect_tx

Script.mbarrier.arrive_and_expect_tx

Script.mbarrier.arrive_and_expect_tx(barrier, transaction_bytes, sem='release', scope='cta')[source]

Arrive at a barrier and declare expected asynchronous transaction bytes.

Each thread in the current thread group performs two updates on the barrier:

  1. Decrements the pending arrival count by 1.

  2. Increases the pending tx-count by transaction_bytes.

The tx-count tracks asynchronous data transfers (e.g., TMA copies). When an async operation tied to this barrier completes, the hardware automatically decrements the tx-count by the number of bytes transferred. The phase completes only when both pending arrivals and tx-count reach zero.

Typically used with single_thread() so that only one thread sets the tx-count expectation, while the TMA engine performs the actual transfer.

Parameters:
  • barrier (RegisterTensor) – A single-element uint32 register tensor holding the barrier’s shared memory address.

  • transaction_bytes (Expr | int) – The number of bytes expected from async transactions (e.g., TMA copies). The barrier’s tx-count is increased by this value; it is automatically decreased by the hardware as the transactions complete. Must evaluate to a non-negative int32.

  • sem (str) – Memory ordering semantics. Candidates: 'relaxed', 'release'.

  • scope (str) – Synchronization scope. Candidates: 'cta', 'cluster'.

Return type:

None

Notes

  • Thread group: Can be executed by any sized thread group.

  • Hardware: Requires compute capability 9.0+ (sm_90).

  • PTX: mbarrier.arrive.expect_tx.shared::cta.b64