CUDA-Specific Types#
Note
This page is about types specific to CUDA targets. Many other types are also available in the CUDA target - see Built-in types.
Vector Types#
CUDA Vector Types are usable in kernels. There are two important distinctions from vector types in CUDA C/C++:
First, the recommended names for vector types in Numba CUDA is formatted as <base_type>x<N>
,
where base_type
is the base type of the vector, and N
is the number of elements in the vector.
Examples include int64x3
, uint16x4
, float32x4
, etc. For new Numba CUDA kernels,
this is the recommended way to instantiate vector types.
For convenience, users adapting existing kernels from CUDA C/C++ to Python may use
aliases consistent with the C/C++ namings. For example, float3
aliases float32x3
,
long3
aliases int32x3
or int64x3
(depending on the platform), etc.
Second, unlike CUDA C/C++ where factory functions are used, vector types are constructed directly
with their constructor. For example, to construct a float32x3
:
from numba.cuda import float32x3
# In kernel
f3 = float32x3(0.0, -1.0, 1.0)
Additionally, vector types can be constructed from a combination of vector and primitive types, as long as the total number of components matches the result vector type. For example, all of the following constructions are valid:
zero = uint32(0)
u2 = uint32x2(1, 2)
# Construct a 3-component vector with primitive type and a 2-component vector
u3 = uint32x3(zero, u2)
# Construct a 4-component vector with 2 2-component vectors
u4 = uint32x4(u2, u2)
The 1st, 2nd, 3rd and 4th component of the vector type can be accessed through fields
x
, y
, z
, and w
respectively. The components are immutable after
construction in the present version of Numba; it is expected that support for
mutating vector components will be added in a future release.
v1 = float32x2(1.0, 1.0)
v2 = float32x2(1.0, -1.0)
dotprod = v1.x * v2.x + v1.y * v2.y
Narrow Data Types#
Bfloat16#
Note
Bfloat16 is only supported with CUDA version 12.0+, and only supported on devices with compute capability 8.0 or above.
To determine whether bfloat16
is supported in the current configuration,
use:
- numba.cuda.is_bfloat16_supported()#
Returns
True
if the current device and toolkit support bfloat16.False
otherwise.
Data Movement and Casts#
Construction of a single instance of a bfloat16
object:
- numba.cuda.types.bfloat16(b)#
Constructs a
bfloat16
from existing device scalar. Supported scalar types:float64
float32
float16
int64
int32
uint64
uint32
float16
Conversely, bfloat16
data can be cast back to existing native data type via
dtype(b)
, where dtype
is one of the data types above (except float16),
and b
is a bfloat16 object.
Arithmetic#
Supported arithmetic operations on bfloat16
operands are:
Arithmetic (
+
,-
,*
,/
)Arithmetic assignment operators (
+=
,-=
,*=
,/=
)Logical operators (
==
,!=
,>
,<
,>=
,<=
)Unary arithmetic (
+
,-
)
Math Intrinsics#
A number of math intrinsics that utilizes the device native computing feature
on bfloat16
are provided:
- numba.cuda.bf16.htrunc(b)#
- Round ``b`` to the nearest integer value that does not exceed ``b`` in magnitude.
- numba.cuda.bf16.hceil(b)#
- Compute the smallest integer value not less than ``b``.
- numba.cuda.bf16.hfloor(b)#
- Calculate the largest integer value which is less than or equal to ``b``.
- numba.cuda.bf16.hrint(b)#
- Round ``b`` to the nearest integer value in nv_bfloat16 floating-point format,
- with halfway cases rounded to the nearest even integer value.
- numba.cuda.bf16.hsqrt(b)#
- Calculates bfloat16 square root of input ``b`` in round-to-nearest-even mode.
- numba.cuda.bf16.hrsqrt(b)#
- Calculates bfloat16 reciprocal square root of input ``b`` in round-to-nearest-even mode.
- numba.cuda.bf16.hrcp(b)#
- Calculates bfloat16 reciprocal of input a in round-to-nearest-even mode.
- numba.cuda.bf16.hlog(b)#
- Calculates bfloat16 natural logarithm of input ``b`` in round-to-nearest-even
- mode.
- numba.cuda.bf16.hlog2(b)#
- Calculates bfloat16 binary logarithm (base-2) of input ``b`` in
- round-to-nearest-even mode.
- numba.cuda.bf16.hlog10(b)#
- Calculates bfloat16 common logarithm (base-10) of input ``b`` in
- round-to-nearest-even mode.
- numba.cuda.bf16.hcos(b)#
- Calculates bfloat16 cosine of input ``b`` in round-to-nearest-even mode.
Note
This function’s implementation calls cosf(float) function and is exposed to compiler optimizations. Specifically, use_fast_math mode changes cosf(float) into an intrinsic __cosf(float), which has less accurate numeric behavior.
- numba.cuda.bf16.hsin(b)#
- Calculates bfloat16 sine of input ``b`` in round-to-nearest-even mode.
Note
This function’s implementation calls sinf(float) function and is exposed to compiler optimizations. Specifically, use_fast_math flag changes sinf(float) into an intrinsic __sinf(float), which has less accurate numeric behavior.
- numba.cuda.bf16.htanh(b)#
- Calculates bfloat16 hyperbolic tangent function: ``tanh(b)`` in round-to-nearest-even mode.
- numba.cuda.bf16.htanh_approx(b)#
- Calculates approximate bfloat16 hyperbolic tangent function: ``tanh(b)``.
- This operation uses HW acceleration on devices of compute capability 9.x and higher.
Note
tanh_approx(0) returns 0 tanh_approx(inf) returns 1 tanh_approx(nan) returns nan
- numba.cuda.bf16.hexp(b)#
- Calculates bfloat16 natural exponential function of input ``b`` in
- round-to-nearest-even mode.
- numba.cuda.bf16.hexp2(b)#
- Calculates bfloat16 binary exponential function of input ``b`` in
- round-to-nearest-even mode.
- numba.cuda.bf16.hexp10(b)#
- Calculates bfloat16 decimal exponential function of input ``b`` in
- round-to-nearest-even mode.
Arithmetic Intrinsics#
The following low-level arithmetic intrinsics are available under
numba.cuda.bf16
and map to CUDA bfloat16 arithmetic functions. Unless
otherwise noted, operations are performed in round-to-nearest-even mode.
- numba.cuda.bf16.habs(a)#
Calculates the absolute value of input
a
(bfloat16) and returns the result.
- numba.cuda.bf16.hneg(a)#
Negates input
a
(bfloat16) and returns the result.
- numba.cuda.bf16.hadd(a, b)#
Adds
a
andb
(bfloat16) in round-to-nearest-even mode.
- numba.cuda.bf16.hadd_rn(a, b)#
Adds
a
andb
(bfloat16) in round-to-nearest-even mode. Prevents contraction of separate operations into a fused-multiply-add.
- numba.cuda.bf16.hadd_sat(a, b)#
Adds
a
andb
(bfloat16) in round-to-nearest-even mode, with saturation to the range[0.0, 1.0]
. NaN results are flushed to+0.0
.
- numba.cuda.bf16.hsub(a, b)#
Subtracts
b
froma
(bfloat16) in round-to-nearest-even mode.
- numba.cuda.bf16.hsub_rn(a, b)#
Subtracts
b
froma
(bfloat16) in round-to-nearest-even mode. Prevents contraction of separate operations into a fused-multiply-add.
- numba.cuda.bf16.hsub_sat(a, b)#
Subtracts
b
froma
(bfloat16) in round-to-nearest-even mode, with saturation to the range[0.0, 1.0]
. NaN results are flushed to+0.0
.
- numba.cuda.bf16.hmul(a, b)#
Multiplies
a
andb
(bfloat16) in round-to-nearest-even mode.
- numba.cuda.bf16.hmul_rn(a, b)#
Multiplies
a
andb
(bfloat16) in round-to-nearest-even mode. Prevents contraction of separate operations into a fused-multiply-add.
- numba.cuda.bf16.hmul_sat(a, b)#
Multiplies
a
andb
(bfloat16) in round-to-nearest-even mode, with saturation to the range[0.0, 1.0]
. NaN results are flushed to+0.0
.
- numba.cuda.bf16.hdiv(a, b)#
Divides
a
byb
(bfloat16) in round-to-nearest-even mode.
- numba.cuda.bf16.hfma(a, b, c)#
Computes a fused multiply-add of
a
andb
plusc
(bfloat16) in round-to-nearest-even mode; i.e. returnsa * b + c
.
- numba.cuda.bf16.hfma_sat(a, b, c)#
Fused multiply-add in round-to-nearest-even mode with saturation to the range
[0.0, 1.0]
. NaN results are flushed to+0.0
.
- numba.cuda.bf16.hfma_relu(a, b, c)#
Fused multiply-add in round-to-nearest-even mode with ReLU saturation; i.e. returns
max(0, a * b + c)
.
Comparison Intrinsics#
Device-level comparison intrinsics operating on bfloat16
values are
available under numba.cuda.bf16
. Unless stated otherwise, the ordered
comparisons return False
if either input is NaN, following IEEE semantics.
- numba.cuda.bf16.heq(a, b)#
Ordered equality. Returns
True
iffa == b
. NaN inputs yieldFalse
.
- numba.cuda.bf16.hne(a, b)#
Ordered inequality. Returns
True
iffa != b
and neither input is NaN. NaN inputs yieldFalse
.
- numba.cuda.bf16.hge(a, b)#
Ordered greater-or-equal. NaN inputs yield
False
.
- numba.cuda.bf16.hgt(a, b)#
Ordered greater-than. NaN inputs yield
False
.
- numba.cuda.bf16.hle(a, b)#
Ordered less-or-equal. NaN inputs yield
False
.
- numba.cuda.bf16.hlt(a, b)#
Ordered less-than. NaN inputs yield
False
.
The unordered comparison variants return True
when either input is NaN:
- numba.cuda.bf16.hequ(a, b)#
Unordered equality. Returns
True
ifa
orb
is NaN, or ifa == b
.
- numba.cuda.bf16.hneu(a, b)#
Unordered inequality. Returns
True
ifa
orb
is NaN, or ifa != b
.
- numba.cuda.bf16.hgeu(a, b)#
Unordered greater-or-equal. Returns
True
ifa
orb
is NaN, or ifa >= b
.
- numba.cuda.bf16.hgtu(a, b)#
Unordered greater-than. Returns
True
ifa
orb
is NaN, or ifa > b
.
- numba.cuda.bf16.hleu(a, b)#
Unordered less-or-equal. Returns
True
ifa
orb
is NaN, or ifa <= b
.
- numba.cuda.bf16.hltu(a, b)#
Unordered less-than. Returns
True
ifa
orb
is NaN, or ifa < b
.
Min/Max operations follow CUDA semantics for zeros and NaNs:
- numba.cuda.bf16.hmax(a, b)#
Returns
max(a, b)
with the following behavior: if either input is NaN, the other input is returned; if both are NaN, the canonical NaN is returned. If both inputs are zero,+0.0 > -0.0
.
- numba.cuda.bf16.hmin(a, b)#
Returns
min(a, b)
with the following behavior: if either input is NaN, the other input is returned; if both are NaN, the canonical NaN is returned. If both inputs are zero,+0.0 > -0.0
.
- numba.cuda.bf16.hmax_nan(a, b)#
Returns
max(a, b)
where NaNs pass through: if either input is NaN, the canonical NaN is returned.
- numba.cuda.bf16.hmin_nan(a, b)#
Returns
min(a, b)
where NaNs pass through: if either input is NaN, the canonical NaN is returned.
Special value predicates:
- numba.cuda.bf16.hisnan(a)#
Returns
True
ifa
is a NaN,False
otherwise.
- numba.cuda.bf16.hisinf(a)#
Returns a nonzero integer if
a
is infinite, otherwise0
.
Note
Python comparison operators on bfloat16
values in device code map to
the ordered comparisons above. For more details on the CUDA bfloat16
comparison semantics, see NVIDIA CUDA Math API: Bfloat16 Comparison Functions.
Precision Conversion and Data Movement#
The following conversion intrinsics convert between bfloat16
and other
scalar types. Rounding-mode suffixes:
_rn
: round-to-nearest-even_rz
: round-towards-zero_rd
: round-down (towards −∞)_ru
: round-up (towards +∞)
Floating-point conversions#
- numba.cuda.bf16.float32_to_bfloat16(x)#
Convert a
float32
tobfloat16
(default rounding is round-to-nearest-even).
- numba.cuda.bf16.float64_to_bfloat16(x)#
Convert a
float64
tobfloat16
(default rounding is round-to-nearest-even).
- numba.cuda.bf16.bfloat16_to_float32(x)#
Convert a
bfloat16
tofloat32
.
- numba.cuda.bf16.float32_to_bfloat16_rn(x)#
- numba.cuda.bf16.float32_to_bfloat16_rz(x)#
- numba.cuda.bf16.float32_to_bfloat16_rd(x)#
- numba.cuda.bf16.float32_to_bfloat16_ru(x)#
Convert a
float32
tobfloat16
using the specified rounding mode.
Integer conversions#
Representative APIs for each integer width are listed below. All have
rounding-mode variants _rn
, _rz
, _rd
, _ru
.
int16 (signed 16-bit)#
- numba.cuda.bf16.int16_to_bfloat16_rn(x)#
- numba.cuda.bf16.int16_to_bfloat16_rz(x)#
- numba.cuda.bf16.int16_to_bfloat16_rd(x)#
- numba.cuda.bf16.int16_to_bfloat16_ru(x)#
Convert an
int16
tobfloat16
with the selected rounding mode.
- numba.cuda.bf16.bfloat16_to_int16_rn(x)#
- numba.cuda.bf16.bfloat16_to_int16_rz(x)#
- numba.cuda.bf16.bfloat16_to_int16_rd(x)#
- numba.cuda.bf16.bfloat16_to_int16_ru(x)#
Convert a
bfloat16
toint16
with the selected rounding mode.
uint16 (unsigned 16-bit)#
- numba.cuda.bf16.uint16_to_bfloat16_rn(x)#
- numba.cuda.bf16.uint16_to_bfloat16_rz(x)#
- numba.cuda.bf16.uint16_to_bfloat16_rd(x)#
- numba.cuda.bf16.uint16_to_bfloat16_ru(x)#
Convert a
uint16
tobfloat16
with the selected rounding mode.
- numba.cuda.bf16.bfloat16_to_uint16_rn(x)#
- numba.cuda.bf16.bfloat16_to_uint16_rz(x)#
- numba.cuda.bf16.bfloat16_to_uint16_rd(x)#
- numba.cuda.bf16.bfloat16_to_uint16_ru(x)#
Convert a
bfloat16
touint16
with the selected rounding mode.
int32 (signed 32-bit)#
- numba.cuda.bf16.int32_to_bfloat16_rn(x)#
- numba.cuda.bf16.int32_to_bfloat16_rz(x)#
- numba.cuda.bf16.int32_to_bfloat16_rd(x)#
- numba.cuda.bf16.int32_to_bfloat16_ru(x)#
Convert an
int32
tobfloat16
with the selected rounding mode.
- numba.cuda.bf16.bfloat16_to_int32_rn(x)#
- numba.cuda.bf16.bfloat16_to_int32_rz(x)#
- numba.cuda.bf16.bfloat16_to_int32_rd(x)#
- numba.cuda.bf16.bfloat16_to_int32_ru(x)#
Convert a
bfloat16
toint32
with the selected rounding mode.
uint32 (unsigned 32-bit)#
- numba.cuda.bf16.uint32_to_bfloat16_rn(x)#
- numba.cuda.bf16.uint32_to_bfloat16_rz(x)#
- numba.cuda.bf16.uint32_to_bfloat16_rd(x)#
- numba.cuda.bf16.uint32_to_bfloat16_ru(x)#
Convert a
uint32
tobfloat16
with the selected rounding mode.
- numba.cuda.bf16.bfloat16_to_uint32_rn(x)#
- numba.cuda.bf16.bfloat16_to_uint32_rz(x)#
- numba.cuda.bf16.bfloat16_to_uint32_rd(x)#
- numba.cuda.bf16.bfloat16_to_uint32_ru(x)#
Convert a
bfloat16
touint32
with the selected rounding mode.
int64 (signed 64-bit)#
- numba.cuda.bf16.int64_to_bfloat16_rn(x)#
- numba.cuda.bf16.int64_to_bfloat16_rz(x)#
- numba.cuda.bf16.int64_to_bfloat16_rd(x)#
- numba.cuda.bf16.int64_to_bfloat16_ru(x)#
Convert an
int64
tobfloat16
with the selected rounding mode.
- numba.cuda.bf16.bfloat16_to_int64_rn(x)#
- numba.cuda.bf16.bfloat16_to_int64_rz(x)#
- numba.cuda.bf16.bfloat16_to_int64_rd(x)#
- numba.cuda.bf16.bfloat16_to_int64_ru(x)#
Convert a
bfloat16
toint64
with the selected rounding mode.
uint64 (unsigned 64-bit)#
- numba.cuda.bf16.uint64_to_bfloat16_rn(x)#
- numba.cuda.bf16.uint64_to_bfloat16_rz(x)#
- numba.cuda.bf16.uint64_to_bfloat16_rd(x)#
- numba.cuda.bf16.uint64_to_bfloat16_ru(x)#
Convert a
uint64
tobfloat16
with the selected rounding mode.
- numba.cuda.bf16.bfloat16_to_uint64_rn(x)#
- numba.cuda.bf16.bfloat16_to_uint64_rz(x)#
- numba.cuda.bf16.bfloat16_to_uint64_rd(x)#
- numba.cuda.bf16.bfloat16_to_uint64_ru(x)#
Convert a
bfloat16
touint64
with the selected rounding mode.
8-bit conversions#
- numba.cuda.bf16.bfloat16_to_int8_rz(x)#
Convert a
bfloat16
toint8
with round-towards-zero.
- numba.cuda.bf16.bfloat16_to_uint8_rz(x)#
Convert a
bfloat16
touint8
with round-towards-zero.
Bit Reinterpret Casts#
These APIs reinterpret bits without numeric conversion:
- numba.cuda.bf16.bfloat16_as_int16(x)#
Reinterpret the bits of
bfloat16
as anint16
.
- numba.cuda.bf16.bfloat16_as_uint16(x)#
Reinterpret the bits of
bfloat16
as auint16
.
- numba.cuda.bf16.int16_as_bfloat16(x)#
Reinterpret the bits of an
int16
as abfloat16
.
- numba.cuda.bf16.uint16_as_bfloat16(x)#
Reinterpret the bits of a
uint16
as abfloat16
.