cuda::align_up
template <typename T>
[[nodiscard]] __host__ __device__ inline
T* align_up(T* ptr, size_t alignment) noexcept
The function returns the original pointer or closest pointer larger than ptr
that is aligned to the specified alignment \(ceil\left(\frac{ptr}{alignment}\right) * alignment\).
Parameters
ptr
: The pointer.alignment
: The alignment.
Return value
The original pointer or closest pointer larger than
ptr
that is aligned to the specified alignment.
Constraints
alignment
must be a power of two.alignment >= alignof(T)
.ptr
is aligned toalignof(T)
.
Performance considerations
The function is optimized for compile-time values of
alignment
.The returned pointer is decorated with
__builtin_assume_aligned
to help the compiler generate better code.
Example
#include <cuda/memory>
__global__ void kernel(const int* ptr) {
auto ptr_align16 = cuda::align_up(ptr, 16);
reinterpret_cast<int4*>(ptr_align16)[0] = int4{1, 2, 3, 4};
}
int main() {
int* ptr;
cudaMalloc(&ptr, 100 * sizeof(int));
kernel<<<1, 1>>>(ptr);
cudaDeviceSynchronize();
return 0;
}