cuda::experimental::stf::async_resources_handle#
-
class async_resources_handle#
A handle which stores resources useful for an efficient asynchronous execution.
For example this will store the pools of CUDA streams.
This class relies on a PIMPL idiom and can be passed by value. Creating a new object of this type does not initialize any resource, as these will be set lazily.
- Thread safety
A single handle (and its copies, which share one PIMPL state) may be used concurrently by several host threads submitting work to the same context. The internal caches reached during task submission — the cross-stream synchronization cache and the per-place stream-pool registry — are mutex-guarded. This only makes the runtime’s own bookkeeping race-free; conflicting accesses to the same logical data must still be ordered through the task dependency graph.
- Lifetime and CUDA teardown
The handle caches resources bound to a live CUDA context: pools of
cudaStream_t, cachedcudaGraphExec_t, and cross-stream synchronization state. Before destroying a handle that was shared across contexts, perform a blocking synchronization on all work that used it — otherwise it would release streams still carrying in-flight work. For the same reason a handle must not survive a CUDA context teardown such ascudaDeviceReset()(or primary-context destruction by an external framework): its cachedcudaStream_t/cudaGraphExec_twould become dangling. Synchronize, destroy the handle, and create a fresh one once CUDA has been relaunched.
Public Functions
-
inline async_resources_handle()#
-
inline explicit async_resources_handle(::std::nullptr_t)#
-
inline explicit operator bool() const#
- inline ::cuda::experimental::places::exec_place_resources &get_place_resources(
Access the registry of per-place stream pools owned by this handle.
The returned reference is valid for the lifetime of the handle (PIMPL shared state). Multiple handles produce independent registries; copies of the same handle share one registry. The registry itself is internally mutex-guarded for concurrent lookups.
- inline bool validate_sync_and_update(
- unsigned long long dst,
- unsigned long long src,
- int event_id
- inline ::cuda::std::pair<::std::shared_ptr<cudaGraphExec_t>, bool> cached_graphs_query(
- size_t nnodes,
- size_t nedges,
- cudaGraph_t g
- inline ::cuda::std::pair<::std::shared_ptr<cudaGraphExec_t>, bool> cached_graphs_query(
- cudaGraph_t g
-
inline auto &gc_helper(int dev_id)#
- inline ::std::shared_ptr<green_context_helper> get_gc_helper(
- int dev_id,
- int sm_count
- int dev_id,
- ::std::shared_ptr<green_context_helper> helper
-
inline exec_affinity &get_affinity()#
-
inline const exec_affinity &get_affinity() const#
-
inline bool has_affinity() const#
- ::std::vector<::std::shared_ptr<exec_place>> p
-
inline void pop_affinity() const#
- inline const ::std::vector<::std::shared_ptr<exec_place>> ¤t_affinity(
Public Static Attributes
-
static constexpr size_t pool_size = ::cuda::experimental::places::exec_place::impl::pool_size#
Default size of stream pools created for places looked up through this handle’s registry.
Re-exported here to support call sites that want to size buffers without including
places.cuhdirectly.