Script.atomic.global_add

Script.atomic.global_add

Script.atomic.global_add(dst, values, *, sem='relaxed', scope='gpu', output=None)[source]

Element-wise dst[i] = dst[i] + values[i] atomically, on global memory.

See shared_add() for the full parameter description; the only difference is that dst is a GlobalTensor and the default scope is 'gpu' rather than 'cta'.

Notes

  • Thread group: Can be executed by any sized thread group.

  • Hardware: Requires compute capability 7.0+ (sm_70).

  • PTX: atom.{sem}.{scope}.global.add.s32 (or red.* when the output is unused).

Parameters:
Return type:

RegisterTensor | None