PTX
The cuda::ptx
namespace contains functions that map one-to-one to
PTX instructions.
These can be used for maximal control of the generated code, or to
experiment with new hardware features before a high-level C++ API is
available.
Versions and compatibility
The cuda/ptx
header is intended to present a stable API (not ABI)
within one major version of the CTK on a best effort basis. This means
that:
All functions are marked static inline.
The type of a function parameter can be changed to be more generic if that means that code that called the original version can still be compiled.
Good exposure of the PTX should be high priority. If, at a new major version, we face a difficult choice between breaking backward-compatibility and an improvement of the PTX exposure, we will tend to the latter option more easily than in other parts of libcu++.
API stability is not taken to the extreme. Call functions like below to ensure forward-compatibility:
// Use arguments to drive overload resolution:
cuda::ptx::mbarrier_arrive_expect_tx(cuda::ptx::sem_release, cuda::ptx::scope_cta, cuda::ptx::space_shared, &bar, 1);
// Specifying templates directly is not forward-compatible, as order and number
// of template parameters may change in a minor release:
cuda::ptx::mbarrier_arrive_expect_tx<cuda::ptx::sem_release_t>(
cuda::ptx::sem_release, cuda::ptx::scope_cta, cuda::ptx::space_shared, &bar, 1
);
PTX ISA version and compute capability. Each binding notes under which PTX ISA version and SM version it may be used. Example:
// mbarrier.arrive.shared::cta.b64 state, [addr]; // 1. PTX ISA 70, SM_80
__device__ inline uint64_t mbarrier_arrive(
cuda::ptx::sem_release_t sem,
cuda::ptx::scope_cta_t scope,
cuda::ptx::space_shared_t space,
uint64_t* addr);
To check if the current compiler is recent enough, use:
#if __cccl_ptx_isa >= 700
cuda::ptx::mbarrier_arrive(cuda::ptx::sem_release, cuda::ptx::scope_cta, cuda::ptx::space_shared, &bar, 1);
#endif
Ensure that you only call the function when compiling for a recent enough compute capability (SM version), like this:
NV_IF_TARGET(NV_PROVIDES_SM_80,(
cuda::ptx::mbarrier_arrive(cuda::ptx::sem_release, cuda::ptx::scope_cta, cuda::ptx::space_shared, &bar, 1);
));
For more information on which compilers correspond to which PTX ISA, see the PTX ISA release notes.
Instructions by section
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
CCCL 2.3.0 / CUDA 12.4 |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
CCCL 2.4.0 / CUDA 12.5 |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
CCCL 2.4.0 / CUDA 12.5 |
|
CCCL 2.4.0 / CUDA 12.5 |
|
No |
|
CCCL 2.4.0 / CUDA 12.5 |
|
CCCL 2.4.0 / CUDA 12.5 |
|
No |
|
CCCL 2.4.0 / CUDA 12.5 |
|
CCCL 2.4.0 / CUDA 12.5 |
|
CCCL 2.4.0 / CUDA 12.5 |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
CCCL 2.4.0 / CUDA 12.5 |
|
No |
|
CCCL 2.4.0 / CUDA 12.5 |
|
No |
|
No |
|
CCCL 2.3.0 / CUDA 12.4 |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
CCCL 2.5.0 / CUDA Future |
|
No |
|
No |
|
No |
|
CCCL 2.3.0 / CUDA 12.4 |
|
No |
|
No |
|
CCCL 2.3.0 / CUDA 12.4 |
|
CCCL 2.3.0 / CUDA 12.4 |
|
No |
|
CCCL 2.4.0 / CUDA 12.5 |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
Instruction |
Available in libcu++ |
---|---|
No |
|
No |
|
No |
|
No |
|
No |
Instruction |
PTX ISA |
SM Version |
Available in libcu++ |
---|---|---|---|
20 |
All |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
All |
CCCL 2.4.0 / CUDA 12.5 |
|
13 |
All |
CCCL 2.4.0 / CUDA 12.5 |
|
13 |
All |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
20 |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
All |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
All |
CCCL 2.4.0 / CUDA 12.5 |
|
13 |
All |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
20 |
CCCL 2.4.0 / CUDA 12.5 |
|
30 |
30 |
CCCL 2.4.0 / CUDA 12.5 |
|
78 |
90 |
CCCL 2.4.0 / CUDA 12.5 |
|
78 |
90 |
CCCL 2.4.0 / CUDA 12.5 |
|
78 |
90 |
CCCL 2.4.0 / CUDA 12.5 |
|
78 |
90 |
CCCL 2.4.0 / CUDA 12.5 |
|
78 |
90 |
CCCL 2.4.0 / CUDA 12.5 |
|
78 |
90 |
CCCL 2.4.0 / CUDA 12.5 |
|
78 |
90 |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
20 |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
20 |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
20 |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
20 |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
20 |
CCCL 2.4.0 / CUDA 12.5 |
|
10 |
All |
CCCL 2.4.0 / CUDA 12.5 |
|
20 |
20 |
CCCL 2.4.0 / CUDA 12.5 |
|
No |
|||
No |
|||
No |
|||
31 |
31 |
CCCL 2.4.0 / CUDA 12.5 |
|
No |
|||
41 |
20 |
CCCL 2.4.0 / CUDA 12.5 |
|
81 |
90 |
CCCL 2.4.0 / CUDA 12.5 |
|
41 |
20 |
CCCL 2.4.0 / CUDA 12.5 |
|
80 |
50 |
CCCL 2.4.0 / CUDA 12.5 |