Script.atomic.shared_sub

Script.atomic.shared_sub

Script.atomic.shared_sub(dst, values, *, sem='relaxed', scope='cta', output=None)[source]

Element-wise dst[i] = dst[i] - values[i] atomically, on shared memory.

PTX has no native atom.sub; the codegen lowers this to atom.add with the negated operand. See shared_add() for the full parameter description.

Notes

  • Thread group: Can be executed by any sized thread group.

  • Hardware: Requires compute capability 7.0+ (sm_70).

  • PTX: atom.{sem}.{scope}.shared.add.s32 with a negated input.

Parameters:
Return type:

RegisterTensor | None