Script.atomic.shared_cas¶
- Script.atomic.shared_cas(dst, compare, values, *, sem='relaxed', scope='cta', output=None)[source]¶
Element-wise compare-and-swap on shared memory.
Per element:
old = dst[i]; if (old == compare[i]) dst[i] = values[i], atomically. The returnedoutput(if bound) holdsold, which the caller typically inspects to decide whether the swap succeeded.- Parameters:
dst (SharedTensor) – Destination tile in shared memory.
compare (RegisterTensor) – Expected-old-value tile; same shape and dtype as
dst.values (RegisterTensor) – Tile of replacement values; same shape and dtype as
dst.sem (str) – See
shared_add().scope (str) – See
shared_add().output (RegisterTensor | None) – See
shared_add().
- Returns:
Pre-CAS value at each element when
outputis consumed;Noneotherwise. Note that, unlike the arithmetic ops, CAS has nored.*form, so an unused output still costs a register allocation at the PTX level.- Return type:
RegisterTensor or None
Notes
Thread group: Can be executed by any sized thread group.
Hardware: Requires compute capability 7.0+ (sm_70).
PTX:
atom.{sem}.{scope}.shared.cas.s32.