Script.atomic.global_add

Contents

Script.atomic.global_add¶

Script.atomic.global_add(dst, values, *, sem='relaxed', scope='gpu', output=None)[source]¶

Element-wise dst[i] = dst[i] + values[i] atomically, on global memory.

See shared_add() for the full parameter description; the only difference is that dst is a GlobalTensor and the default scope is 'gpu' rather than 'cta'.

Notes

Thread group: Can be executed by any sized thread group.
Hardware: Requires compute capability 7.0+ (sm_70).
PTX: atom.{sem}.{scope}.global.add.s32 (or red.* when the output is unused).

Parameters:

dst (GlobalTensor)
values (RegisterTensor)
sem (str)
scope (str)
output (RegisterTensor | None)

Return type:

RegisterTensor | None