Script.atomic.shared_sub

Contents

Script.atomic.shared_sub¶

Script.atomic.shared_sub(dst, values, *, sem='relaxed', scope='cta', output=None)[source]¶

Element-wise dst[i] = dst[i] - values[i] atomically, on shared memory.

PTX has no native atom.sub; the codegen lowers this to atom.add with the negated operand. See shared_add() for the full parameter description.

Notes

Thread group: Can be executed by any sized thread group.
Hardware: Requires compute capability 7.0+ (sm_70).
PTX: atom.{sem}.{scope}.shared.add.s32 with a negated input.

Parameters:

dst (SharedTensor)
values (RegisterTensor)
sem (str)
scope (str)
output (RegisterTensor | None)

Return type:

RegisterTensor | None