CUDA Tile support#
CUDA Tile introduces a new way to program GPUs at a higher level than SIMT.
We generally support most features in tile mode such as
cuda::std::array
cuda::std::expected
cuda::std::initializer_list
cuda::std::optional
cuda::std::pair
cuda::std::span
cuda::std::tuple
cuda::std::variant
Restrictions#
With the compiler taking more control over memory and threading there are a number of restrictions in a tile program:
C++ Concurrency support#
Currently the use of inline ptx / assembly is not allowed in a tile program. All of our threading features rely on inline assembly in some capacity. Consequently, the following headers are not supported in tile mode:
<cuda/atomic>
<cuda/barrier>
<cuda/latch>
<cuda/pipeline>
<cuda/semaphore>
<cuda/std/atomic>
<cuda/std/barrier>
<cuda/std/execution>
<cuda/std/latch>
<cuda/std/semaphore>
This also affects
<cuda/cmath>
<cuda/discard_memory>
<cuda/ptx>
C++ mathematical operations#
We rely heavily on compiler builtins or cuda runtime functions to implement C++ standard math functions such
as cuda::std::exp. Those compiler builtins are not currently supported in tile mode, so the following headers
are mostly unsupported:
<cuda/std/cmath>
<cuda/std/complex>
C++ customization point objects#
The standard library uses __ Customization Point Objects __ to enable user-customization of the behavior of many algorithms and ranges. We rely heavily on those for most of our iterator machinery such as e.g cuda::std::begin.
Those CPOs are currently not accessible in a tile program.
CUDA device intrinsics#
In tile mode the compiler handles threads, warps and blocks. Consequently, the access of CUDA device intrinsics such as
threadIdx is currently not allowed in a tile program. Therefore the following headers are not supported in tile mode:
<cuda/access_property>
<cuda/annotated_ptr>
<cuda/discard_memory>
<cuda/hierarchy>
<cuda/ptx>
CUDA extended floating point types#
Tile programs treat the CUDA extended floating point types as compiler builtin types.
This disallows accessing their internals which we require internally.
Support for extended floating point types such as __half, __nv_bfloat16 is disabled in tile mode.
Taking the address of a function#
It is currently not supported to take the address of a function in a tile program. This affects our memory resource machinery, so the following headers are unsupported in tile mode:
<cuda/memory>
<cuda/memory_resource>