cuda::fast_mod_div#
Defined in <cuda/cmath> header.
template <typename T, bool DivisorIsNeverOne = false>
class fast_mod_div {
public:
fast_mod_div() = delete;
__host__ __device__
explicit fast_mod_div(T divisor) noexcept;
template <typename U>
[[nodiscard]] __host__ __device__ friend
cuda:::std::common_type_t<T, U> operator/(U dividend, fast_mod_div<T> divisor) noexcept;
template <typename U>
[[nodiscard]] __host__ __device__ friend
cuda:::std::common_type_t<T, U> operator%(U dividend, fast_mod_div<T> divisor) noexcept;
[[nodiscard]] __host__ __device__
operator T() const noexcept;
};
template <typename T, typename U>
[[nodiscard]] __host__ __device__
cuda::std::pair<T, U> div(T dividend, fast_mod_div<U> divisor) noexcept;
The class fast_mod_div is used to pre-compute the modulo and division of an integer value, to be used in a second stage for efficiency: \(floor\left(\frac{dividend}{divisor}\right)\).
Parameters
divisor: The divisor.dividend: The dividend.DivisorIsNeverOne: Indicates thatdivisor != 1and skips one comparison in the second stage.
Constraints
TandUare integer types of size up to 64-bits.max_value(dividend type) <= max_value(divisor type).
Preconditions
divisor > 0.dividend >= 0.divisor > 1ifDivisorIsNeverOne == true.
Performance considerations
fast_mod_divneeds to be initialized on the host and executed on the device for optimal performance.Tsigned type ensures the best performance.T == inttranslates toSEL,IMAD, and x2SHFinstructions.64-bit types are in general slower than 32-bit types.
DivisorIsNeverOne == truecan be used to skip one comparison.__builtin_assume(dividend != cuda::std::numeric_limits<U>::max())can be used to skip one comparison with unsigned values.
Example#
#include <cuda/cmath>
#include <cuda/std/cassert>
__global__ void div_kernel(cuda::fast_mod_div<int> divisor) {
assert(45 / divisor == 2);
assert(45 % divisor == 5);
assert((cuda::div(45, divisor) == cuda::std::pair{2, 5}));
}
int main() {
cuda::fast_mod_div<int> divisor(20);
div_kernel<<<1, 1>>>(divisor);
cudaDeviceSynchronize();
return 0;
}