`cuda::aligned_size_t`#

Defined in headers <cuda/memory>, <cuda/barrier> and <cuda/pipeline>:

template <cuda::std::size_t Alignment>
struct cuda::aligned_size_t {
  static constexpr cuda::std::size_t align = Align;
  cuda::std::size_t value;
  __host__ __device__ explicit constexpr aligned_size(cuda::std::size_t size);
  __host__ __device__ constexpr operator cuda::std::size_t();
};

The class template cuda::aligned_size_t is a shape representing an extent of bytes with a statically defined (address and size) alignment.

Preconditions:

The address of the extent of bytes must be aligned to an Alignment alignment boundary.
The size of the extent of bytes must be a multiple of the Alignment.

Template Parameters#

Alignment

The address and size alignment of the byte extent.

Data Members#

`align`	The alignment of the byte extent.
`value`	The size of the byte extent.

Member Functions#

`(constructor)`	Constructs an aligned size. If the `size` is not a multiple of `Alignment` the behavior is undefined.
`(destructor)`	Trivial implicit destructor.
`operator=`	Trivial implicit copy/move.
`operator cuda::std::size_t`	Implicit conversion to cuda::std::size_t.

Notes#

If Alignment is not a valid alignment, the behavior is undefined.

Example#

#include <cuda/memory>

__global__ void example_kernel(void* dst, void* src, size_t size) {
  cuda::barrier<cuda::thread_scope_system> bar;
  init(&bar, 1);

  // Implementation cannot make assumptions about alignment.
  cuda::memcpy_async(dst, src, size, bar);

  // Implementation can assume that dst and src are 16-bytes aligned,
  // and that size is a multiple of 16, and may optimize accordingly.
  cuda::memcpy_async(dst, src, cuda::aligned_size_t<16>(size), bar);

  bar.arrive_and_wait();
}

See it on Godbolt

cuda::aligned_size_t#