Link Search Menu Expand Document

Extended API

Fundamentals

Thread Scopes Defines the kind of threads that can synchronize using a primitive. (enum)

1.0.0 / CUDA 10.2
Thread Groups Concepts for groups of cooperating threads. (concept)

1.2.0 / CUDA 11.1

Shapes

cuda::std::size_t Defines an extent of bytes. (typedef)

1.0.0 / CUDA 10.2
cuda::aligned_size_t Defines an extent of bytes with a statically defined alignment. (class template)

1.2.0 / CUDA 11.1

Synchronization Primitives

Atomics

cuda::atomic System-wide cuda::std::atomic objects and operations. (class template)

1.0.0 / CUDA 10.2

Latches

cuda::latch System-wide cuda::std::latch single-phase asynchronous thread coordination mechanism. (class template)

1.1.0 / CUDA 11.0

Barriers

cuda::barrier System-wide cuda::std::barrier multi-phase asynchronous thread coordination mechanism. (class template)

1.1.0 / CUDA 11.0

Semaphores

cuda::counting_semaphore System-wide cuda::std::counting_semaphore primitive for constraining concurrent access. (class template)

1.1.0 / CUDA 11.0
cuda::binary_semaphore System-wide cuda::std::binary_semaphore primitive for mutual exclusion. (class template)

1.1.0 / CUDA 11.0

Pipelines

The pipeline library is included in the CUDA Toolkit, but is not part of the open source libcu++ distribution.

cuda::pipeline Coordination mechanism for sequencing asynchronous operations. (class template)

CUDA 11.1
cuda::pipeline_shared_state cuda::pipeline shared state object. (class template)

CUDA 11.1
cuda::pipeline_role Defines producer/consumer role for a thread participating in a pipeline. (enum)

CUDA 11.1
cuda::make_pipeline Creates a cuda::pipeline. (function template)

CUDA 11.1
cuda::pipeline_consumer_wait_prior Blocks the current thread until all operations committed up to a prior pipeline stage complete. (function template)

CUDA 11.1
cuda::pipeline_producer_commit Binds operations previously issued by the current thread to a cuda::barrier. (function template)

CUDA 11.1

Asynchronous Operations

cuda::memcpy_async Asynchronously copies one range to another. (function template)

1.1.0 / CUDA 11.0
1.2.0 / CUDA 11.1 (group & aligned overloads)