Script.mbarrier.wait¶
- Script.mbarrier.wait(barrier, phase, sem='acquire', scope='cta')[source]¶
Wait for a barrier phase to complete.
All threads in the current thread group block until the barrier’s current phase differs from
phase. This means the phase has completed — all pending arrivals and tx-count have reached zero, the hardware has flipped the phase bit, and it is safe to read data produced in that phase.If the barrier’s current phase already differs from
phase(i.e., the phase has already completed), the threads proceed immediately without blocking.With
sem='acquire', all writes made visible by arrive operations (with release semantics) on this barrier in the completed phase are guaranteed visible to the waiting threads.- Parameters:
barrier (RegisterTensor) – A single-element uint32 register tensor holding the barrier’s shared memory address.
phase (Expr | RegisterTensor | int) – The phase to wait for completion of. Must be 0 or 1. Can also be a single-element register tensor.
sem (str) – Memory ordering semantics.
'acquire'ensures writes from the completed phase are visible;'relaxed'provides no ordering. Candidates:'acquire','relaxed'.scope (str) – Synchronization scope. Candidates:
'cta','cluster'.
- Return type:
None
Notes
Thread group: Can be executed by any sized thread group.
Hardware: Requires compute capability 9.0+ (sm_90).
PTX:
mbarrier.try_wait.parity.shared::cta.b64