Script.mbarrier.arrive_and_expect_tx_remote

Script.mbarrier.arrive_and_expect_tx_remote

Script.mbarrier.arrive_and_expect_tx_remote(barrier, transaction_bytes, target_rank, sem='release', scope='cluster')[source]

Arrive at a peer CTA’s barrier with expected async transactions.

Each thread in the current thread group arrives on the barrier in the remote CTA specified by target_rank, using the same shared memory offset as the local barrier. Each thread decrements the remote barrier’s pending arrival count by 1 and increases its tx-count by transaction_bytes.

This is used in cluster-wide pipelines where one CTA needs to signal another CTA’s barrier (e.g., to indicate that data has been loaded into the remote CTA’s shared memory).

Parameters:
  • barrier (RegisterTensor) – A single-element uint32 register tensor with the barrier’s shared memory address in the current CTA. The barrier at the same offset in the target CTA is signaled.

  • transaction_bytes (Expr | int) – Expected async transfer size in bytes. The remote barrier’s tx-count is increased by this value. Must evaluate to a non-negative int32.

  • target_rank (int) – Rank of the target CTA in the cluster. Must be in [0, clusterSize).

  • sem (str) – Memory ordering semantics. Candidates: 'relaxed', 'release'.

  • scope (str) – Synchronization scope. Candidates: 'cta', 'cluster'.

Return type:

None

Notes

  • Thread group: Can be executed by any sized thread group.

  • Hardware: Requires compute capability 9.0+ (sm_90).

  • PTX: mbarrier.arrive.expect_tx.shared::cluster.b64 with mapa.shared::cluster