CUDA-Specific Types ==================== .. note:: This page is about types specific to CUDA targets. Many other types are also available in the CUDA target - see :ref:`cuda-built-in-types`. Vector Types ~~~~~~~~~~~~ `CUDA Vector Types `_ are usable in kernels. There are two important distinctions from vector types in CUDA C/C++: First, the recommended names for vector types in Numba CUDA is formatted as ``x``, where ``base_type`` is the base type of the vector, and ``N`` is the number of elements in the vector. Examples include ``int64x3``, ``uint16x4``, ``float32x4``, etc. For new Numba CUDA kernels, this is the recommended way to instantiate vector types. For convenience, users adapting existing kernels from CUDA C/C++ to Python may use aliases consistent with the C/C++ namings. For example, ``float3`` aliases ``float32x3``, ``long3`` aliases ``int32x3`` or ``int64x3`` (depending on the platform), etc. Second, unlike CUDA C/C++ where factory functions are used, vector types are constructed directly with their constructor. For example, to construct a ``float32x3``: .. code-block:: python3 from numba.cuda import float32x3 # In kernel f3 = float32x3(0.0, -1.0, 1.0) Additionally, vector types can be constructed from a combination of vector and primitive types, as long as the total number of components matches the result vector type. For example, all of the following constructions are valid: .. code-block:: python3 zero = uint32(0) u2 = uint32x2(1, 2) # Construct a 3-component vector with primitive type and a 2-component vector u3 = uint32x3(zero, u2) # Construct a 4-component vector with 2 2-component vectors u4 = uint32x4(u2, u2) The 1st, 2nd, 3rd and 4th component of the vector type can be accessed through fields ``x``, ``y``, ``z``, and ``w`` respectively. The components are immutable after construction in the present version of Numba; it is expected that support for mutating vector components will be added in a future release. .. code-block:: python3 v1 = float32x2(1.0, 1.0) v2 = float32x2(1.0, -1.0) dotprod = v1.x * v2.x + v1.y * v2.y