Script.atomic.shared_cas

Script.atomic.shared_cas

Script.atomic.shared_cas(dst, compare, values, *, sem='relaxed', scope='cta', output=None)[source]

Element-wise compare-and-swap on shared memory.

Per element: old = dst[i]; if (old == compare[i]) dst[i] = values[i], atomically. The returned output (if bound) holds old, which the caller typically inspects to decide whether the swap succeeded.

Parameters:
Returns:

Pre-CAS value at each element when output is consumed; None otherwise. Note that, unlike the arithmetic ops, CAS has no red.* form, so an unused output still costs a register allocation at the PTX level.

Return type:

RegisterTensor or None

Notes

  • Thread group: Can be executed by any sized thread group.

  • Hardware: Requires compute capability 7.0+ (sm_70).

  • PTX: atom.{sem}.{scope}.shared.cas.s32.