cuda::fast_mod_div
Defined in <cuda/cmath>
header.
template <typename T, bool DivisorIsNeverOne = false>
class fast_mod_div {
public:
fast_mod_div() = delete;
__host__ __device__
explicit fast_mod_div(T divisor) noexcept;
template <typename U>
[[nodiscard]] __host__ __device__ friend
cuda:::std::common_type_t<T, U> operator/(U dividend, fast_mod_div<T> divisor) noexcept;
template <typename U>
[[nodiscard]] __host__ __device__ friend
cuda:::std::common_type_t<T, U> operator%(U dividend, fast_mod_div<T> divisor) noexcept;
[[nodiscard]] __host__ __device__
operator T() const noexcept;
};
template <typename T, typename U>
[[nodiscard]] __host__ __device__
cuda::std::pair<T, U> div(T dividend, fast_mod_div<U> divisor) noexcept;
The class fast_mod_div
is used to pre-compute the modulo and division of an integer value, to be used in a second stage for efficiency: \(floor\left(\frac{dividend}{divisor}\right)\).
Parameters
divisor
: The divisor.dividend
: The dividend.DivisorIsNeverOne
: Indicates thatdivisor != 1
and skips one comparison in the second stage.
Constraints
T
andU
are integer types of size up to 64-bits.max_value(dividend type) <= max_value(divisor type)
.
Preconditions
divisor > 0
.dividend >= 0
.divisor > 1
ifDivisorIsNeverOne == true
.
Performance considerations
fast_mod_div
needs to be initialized on the host and executed on the device for optimal performance.T
signed type ensures the best performance.T == int
translates toSEL
,IMAD
, and x2SHF
instructions.64-bit types are in general slower than 32-bit types.
DivisorIsNeverOne == true
can be used to skip one comparison.__builtin_assume(dividend != cuda::std::numeric_limits<U>::max())
can be used to skip one comparison with unsigned values.
Example
#include <cuda/cmath>
#include <cuda/std/cassert>
__global__ void div_kernel(cuda::fast_mod_div<int> divisor) {
assert(45 / divisor == 2);
assert(45 % divisor == 5);
assert((cuda::div(45, divisor) == cuda::std::pair{2, 5}));
}
int main() {
cuda::fast_mod_div<int> divisor(20);
div_kernel<<<1, 1>>>(divisor);
cudaDeviceSynchronize();
return 0;
}