cuda::experimental::stf::async_resources_handle#

class async_resources_handle#

A handle which stores resources useful for an efficient asynchronous execution.

For example this will store the pools of CUDA streams.

This class relies on a PIMPL idiom and can be passed by value. Creating a new object of this type does not initialize any resource, as these will be set lazily.

Thread safety: A single handle (and its copies, which share one PIMPL state) may be used concurrently by several host threads submitting work to the same context. The internal caches reached during task submission — the cross-stream synchronization cache and the per-place stream-pool registry — are mutex-guarded. This only makes the runtime’s own bookkeeping race-free; conflicting accesses to the same logical data must still be ordered through the task dependency graph.
Lifetime and CUDA teardown: The handle caches resources bound to a live CUDA context: pools of cudaStream_t, cached cudaGraphExec_t, and cross-stream synchronization state. Before destroying a handle that was shared across contexts, perform a blocking synchronization on all work that used it — otherwise it would release streams still carrying in-flight work. For the same reason a handle must not survive a CUDA context teardown such as cudaDeviceReset() (or primary-context destruction by an external framework): its cached cudaStream_t / cudaGraphExec_t would become dangling. Synchronize, destroy the handle, and create a fresh one once CUDA has been relaunched.

Public Functions

inline async_resources_handle()#

inline explicit async_resources_handle(::std::nullptr_t)#

inline explicit operator bool() const#

inline ::cuda::experimental::places::exec_place_resources &get_place_resources( ) const#

Access the registry of per-place stream pools owned by this handle.

The returned reference is valid for the lifetime of the handle (PIMPL shared state). Multiple handles produce independent registries; copies of the same handle share one registry. The registry itself is internally mutex-guarded for concurrent lookups.

inline bool validate_sync_and_update( unsigned long long dst, unsigned long long src, int event_id )#

inline ::cuda::std::pair<::std::shared_ptr<cudaGraphExec_t>, bool> cached_graphs_query( size_t nnodes, size_t nedges, cudaGraph_t g )#

inline ::cuda::std::pair<::std::shared_ptr<cudaGraphExec_t>, bool> cached_graphs_query( cudaGraph_t g )#

inline auto &gc_helper(int dev_id)#

inline ::std::shared_ptr<green_context_helper> get_gc_helper( int dev_id, int sm_count )#

inline void register_gc_helper( int dev_id, ::std::shared_ptr<green_context_helper> helper )#

inline exec_affinity &get_affinity()#

inline const exec_affinity &get_affinity() const#

inline bool has_affinity() const#

inline void push_affinity( ::std::vector<::std::shared_ptr<exec_place>> p ) const#

inline void push_affinity(::std::shared_ptr<exec_place> p) const#

inline void pop_affinity() const#

inline const ::std::vector<::std::shared_ptr<exec_place>> &current_affinity( ) const#

Public Static Attributes

static constexpr size_t pool_size = ::cuda::experimental::places::exec_place::impl::pool_size #

Default size of stream pools created for places looked up through this handle’s registry.

Re-exported here to support call sites that want to size buffers without including places.cuh directly.