cuda::atomic_ref

Defined in header <cuda/atomic>:

template <typename T, cuda::thread_scope Scope = cuda::thread_scope_system>
class cuda::atomic_ref;

The class template cuda::atomic_ref is an extended form of cuda::std::atomic_ref that takes an additional cuda::thread_scope argument, defaulted to cuda::std::thread_scope_system.

It has the same interface and semantics as cuda::std::atomic_ref, with the following additional operations. This class additionally deviates from the standard by being backported to C++11.

cuda::atomic_ref::fetch_min

Atomically find the minimum of the stored value and a provided value.

cuda::atomic_ref::fetch_max

Atomically find the maximum of the stored value and a provided value.

Limitations

cuda::atomic_ref<T> and cuda::std::atomic_ref<T> may only be instantiated with a T that are either 4 or 8 bytes.

No object or subobject of an object referenced by an atomic_­ref shall be concurrently referenced by any other atomic_­ref that has a different Scope.

For cuda::atomic_ref<T> and cuda::std::atomic_ref<T> the type T must satisfy the following:
  • 4 <= sizeof(T) <= 8.

  • T must not have “padding bits”, i.e., T’s object representation must not have bits that do not participate in it’s value representation.

Concurrency Restrictions

See memory model documentation for general restrictions on atomicity.

With CUDA Compute Capability 6 (Pascal), an object of type atomic_ref may not be used:
  • with a reference to an object with a automatic storage duration in a GPU thread, or

  • if is_always_lock_free() is false.

For CUDA Compute Capability prior to 6 (Pascal), objects of type cuda::atomic_ref or cuda::std::atomic_ref may not be used.

Implementation-Defined Behavior

For each type T and cuda::thread_scope S, the value of cuda::atomic_ref<T, S>::is_always_lock_free() and cuda::std::atomic_ref<T>::is_always_lock_free() is as follows:

Type T

cuda::thread_scope S

cuda::atomic_ref<T, S>::is_always_lock_free()

Any valid type

Any thread scope

sizeof(T) <= 8

Example

#include <cuda/atomic>

__global__ void example_kernel(int *gmem, int *pinned_mem) {
  // This atomic is suitable for all threads in the system.
  cuda::atomic_ref<int, cuda::thread_scope_system> a(*pinned_mem);

  // This atomic has the same type as the previous one (`a`).
  cuda::atomic_ref<int> b(*pinned_mem);

  // This atomic is suitable for all threads on the current processor (e.g. GPU).
  cuda::atomic_ref<int, cuda::thread_scope_device> c(*gmem);

  __shared__ int shared_v;
  // This atomic is suitable for threads in the same thread block.
  cuda::atomic_ref<int, cuda::thread_scope_block> d(shared_v);
}

See it on Godbolt