Script.mbarrier.arrive

Script.mbarrier.arrive

Script.mbarrier.arrive(barrier, count=1, sem='release', scope='cta')[source]

Arrive at a barrier.

Each thread in the current thread group decrements the barrier’s pending arrival count by count. When pending arrivals (and tx-count) both reach zero, the hardware flips the phase and resets the counters for the next phase.

With sem='release', all prior memory writes by the arriving thread are guaranteed visible to any thread that later completes a successful wait with acquire semantics on this barrier.

Parameters:
  • barrier (RegisterTensor) – A single-element uint32 register tensor holding the barrier’s shared memory address.

  • count (Expr | int) – The number of arrivals contributed by each thread. Must evaluate to a positive int32. Default is 1.

  • sem (str) – Memory ordering semantics. 'release' ensures prior writes are visible to waiters; 'relaxed' provides no ordering guarantees. Candidates: 'relaxed', 'release'.

  • scope (str) – Synchronization scope. Candidates: 'cta', 'cluster'.

Return type:

None

Notes

  • Thread group: Can be executed by any sized thread group.

  • Hardware: Requires compute capability 8.0+ (sm_80).

  • PTX: mbarrier.arrive.shared::cta.b64