Script.atomic.shared_scatter_add¶
- Script.atomic.shared_scatter_add(dst, *, dim, indices, values, sem='relaxed', scope='cta', output=None)[source]¶
Scatter-add into a shared tile along
dim.For each tile element k, performs
dst[..., indices[k], ...] = dst[..., indices[k], ...] + values[k]atomically, whereindicespicks positions alongdimand the non-scatter axes come from the lane’s own tile position.indices.shape == values.shapestrictly (identical RegisterLayout);dst’s non-dimaxes must matchindicesexactly. Out-of-range index values are undefined — there is no runtime bounds check.- Parameters:
dst (SharedTensor) – Destination tile in shared memory.
dim (int) – Compile-time scatter axis into
dst.indices (RegisterTensor) – Per-lane integer indices along
dim.values (RegisterTensor) – Per-lane contributions; same shape and layout as
indices.sem (str) – PTX memory-ordering qualifier. See
AtomicInstructionGroupfor the accepted values.scope (str) – PTX sync scope. See
AtomicInstructionGroup.output (RegisterTensor, optional) – If provided, receives the per-element pre-RMW value at each scattered location (same shape as
indices).
- Returns:
Pre-RMW values when
outputis consumed downstream;Nonewhen unused (the DCE pass rewrites the instruction to the cheaperred.*form).- Return type:
RegisterTensor or None
Notes
Thread group: Can be executed by any sized thread group.
Hardware: Requires compute capability 7.0+ (sm_70).
PTX:
atom.{sem}.{scope}.shared.add.s32(orred.*when the output is unused).