Extended API
Fundamentals
Thread Scopes | Defines the kind of threads that can synchronize using a primitive. (enum) 1.0.0 / CUDA 10.2 |
Thread Groups | Concepts for groups of cooperating threads. (concept) 1.2.0 / CUDA 11.1 |
Shapes
cuda::std::size_t | Defines an extent of bytes. (typedef) 1.0.0 / CUDA 10.2 |
cuda::aligned_size_t | Defines an extent of bytes with a statically defined alignment. (class template) 1.2.0 / CUDA 11.1 |
Synchronization Primitives
Atomics
cuda::atomic | System-wide cuda::std::atomic objects and operations. (class template) 1.0.0 / CUDA 10.2 |
cuda::atomic_ref | System-wide cuda::std::atomic_ref objects and operations. (class template) 1.7.0 / CUDA 11.6 |
Latches
cuda::latch | System-wide cuda::std::latch single-phase asynchronous thread coordination mechanism. (class template) 1.1.0 / CUDA 11.0 |
Barriers
cuda::barrier | System-wide cuda::std::barrier multi-phase asynchronous thread coordination mechanism. (class template) 1.1.0 / CUDA 11.0 |
Semaphores
cuda::counting_semaphore | System-wide cuda::std::counting_semaphore primitive for constraining concurrent access. (class template) 1.1.0 / CUDA 11.0 |
cuda::binary_semaphore | System-wide cuda::std::binary_semaphore primitive for mutual exclusion. (class template) 1.1.0 / CUDA 11.0 |
Pipelines
The pipeline library is included in the CUDA Toolkit, but is not part of the open source libcu++ distribution.
cuda::pipeline | Coordination mechanism for sequencing asynchronous operations. (class template) CUDA 11.1 |
cuda::pipeline_shared_state | cuda::pipeline shared state object. (class template) CUDA 11.1 |
cuda::pipeline_role | Defines producer/consumer role for a thread participating in a pipeline. (enum) CUDA 11.1 |
cuda::make_pipeline | Creates a cuda::pipeline . (function template) CUDA 11.1 |
cuda::pipeline_consumer_wait_prior | Blocks the current thread until all operations committed up to a prior pipeline stage complete. (function template) CUDA 11.1 |
cuda::pipeline_producer_commit | Binds operations previously issued by the current thread to a cuda::barrier . (function template) CUDA 11.1 |
Asynchronous Operations
cuda::memcpy_async | Asynchronously copies one range to another. (function template) 1.1.0 / CUDA 11.0 1.2.0 / CUDA 11.1 (group & aligned overloads) |
Memory access properties
cuda::annotated_ptr | Binds an access property to a pointer. (class template) 1.6.0 / CUDA 11.5 |
cuda::access_property | Represents a memory access property. (class) 1.6.0 / CUDA 11.5 |
cuda::apply_access_property | Applies access property to memory location. (function template) 1.6.0 / CUDA 11.5 |
cuda::associate_access_property | Associates access property with raw pointer. (function template) 1.6.0 / CUDA 11.5 |
cuda::discard_memory | Writes indeterminate values to memory. (function) 1.6.0 / CUDA 11.5 |
Functional
cuda::proclaim_return_type | Creates a forwarding call wrapper that proclaims return type 1.8.0 / CUDA 11.7 |