cuda::aligned_size_t

Defined in headers <cuda/barrier> and <cuda/pipeline>:

template <cuda::std::size_t Alignment>
struct cuda::aligned_size_t {
  static constexpr cuda::std::size_t align = Align;
  cuda::std::size_t value;
  __host__ __device__ explicit constexpr aligned_size(cuda::std::size_t size);
  __host__ __device__ constexpr operator cuda::std::size_t();
};

The class template cuda::aligned_size_t is a shape representing an extent of bytes with a statically defined (address and size) alignment.

Preconditions:

  • The address of the extent of bytes must be aligned to an Alignment alignment boundary.

  • The size of the extent of bytes must be a multiple of the Alignment.

Template Parameters

Alignment

The address and size alignment of the byte extent.

Data Members

align

The alignment of the byte extent.

value

The size of the byte extent.

Member Functions

(constructor)

Constructs an aligned size. If the size is not a multiple of Alignment the behavior is undefined.

(destructor)

Trivial implicit destructor.

operator=

Trivial implicit copy/move.

operator cuda::std::size_t

Implicit conversion to cuda::std::size_t.

Notes

If Alignment is not a valid alignment, the behavior is undefined.

Example

#include <cuda/barrier>

__global__ void example_kernel(void* dst, void* src, size_t size) {
  cuda::barrier<cuda::thread_scope_system> bar;
  init(&bar, 1);

  // Implementation cannot make assumptions about alignment.
  cuda::memcpy_async(dst, src, size, bar);

  // Implementation can assume that dst and src are 16-bytes aligned,
  // and that size is a multiple of 16, and may optimize accordingly.
  cuda::memcpy_async(dst, src, cuda::aligned_size_t<16>(size), bar);

  bar.arrive_and_wait();
}

See it on Godbolt