cuda::bitmask#
Defined in the <cuda/bit> header.
namespace cuda {
template <typename T = uint32_t>
[[nodiscard]] __host__ __device__ constexpr
T bitmask(int start, int width) noexcept;
} // namespace cuda
The function generates a bitmask of size width starting at position start.
Parameters
start: starting position of the bitmask.width: width of the bitmask.
Return value
Bitmask of size
widthstarting at position``start``.
Constraints
Tis an unsigned integral type.
Preconditions
start >= 0 && start <= num_bits(T).width >= 0 && width <= num_bits(T).start + width <= num_bits(T).
Performance considerations
The function performs the following operations in device code:
uint8_t,uint16_t,uint32_t:BMSK.uint64_t:SHLx4,ADDx2.
Note
When the input values are run-time values that the compiler can resolve at compile-time, e.g. an index of a loop with a fixed number of iterations, using the function could not be optimal.
Note
GCC <= 8 uses a slow path with more instructions even in CUDA.
Example#
#include <cuda/bit>
#include <cuda/std/cassert>
#include <cuda/std/cstdint>
__global__ void bitmask_kernel() {
assert(cuda::bitmask(2, 4) == 0b111100u);
assert(cuda::bitmask<uint64_t>(1, 3) == uint64_t{0b1110});
}
int main() {
bitmask_kernel<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;
}