Script.mbarrier.arrive_and_expect_tx_remote¶
- Script.mbarrier.arrive_and_expect_tx_remote(barrier, transaction_bytes, target_rank, sem='release', scope='cluster')[source]¶
Arrive at a peer CTA’s barrier with expected async transactions.
Each thread in the current thread group arrives on the barrier in the remote CTA specified by
target_rank, using the same shared memory offset as the localbarrier. Each thread decrements the remote barrier’s pending arrival count by 1 and increases its tx-count bytransaction_bytes.This is used in cluster-wide pipelines where one CTA needs to signal another CTA’s barrier (e.g., to indicate that data has been loaded into the remote CTA’s shared memory).
- Parameters:
barrier (RegisterTensor) – A single-element uint32 register tensor with the barrier’s shared memory address in the current CTA. The barrier at the same offset in the target CTA is signaled.
transaction_bytes (Expr | int) – Expected async transfer size in bytes. The remote barrier’s tx-count is increased by this value. Must evaluate to a non-negative int32.
target_rank (int) – Rank of the target CTA in the cluster. Must be in
[0, clusterSize).sem (str) – Memory ordering semantics. Candidates:
'relaxed','release'.scope (str) – Synchronization scope. Candidates:
'cta','cluster'.
- Return type:
None
Notes
Thread group: Can be executed by any sized thread group.
Hardware: Requires compute capability 9.0+ (sm_90).
PTX:
mbarrier.arrive.expect_tx.shared::cluster.b64withmapa.shared::cluster