cuda::experimental::stf::stream_task<>

Defined in include/cuda/experimental/__stf/stream/stream_task.cuh

template<>
class stream_task<> : public cuda::experimental::stf::task

Task with dynamic dependencies that uses CUDA streams (and events) to synchronize between the different tasks.

stream_task<> automatically selects a stream from an internal pool if needed, or take a user-provided stream (by calling set_stream). All operations in a task are expected to be executed asynchronously with respect to that task’s stream.

This task type accepts dynamic dependencies, i.e. dependencies can be added at runtime by calling add_deps() or add_deps() prior to starting the task with start(). In turn, the added dependencies have dynamic types. It is the caller’s responsibility to access the correct types for each dependency by calling get<T>(index).

Public Types

enum class phase

current task status

We keep track of the status of task so that we do not make API calls at an inappropriate time, such as setting the symbol once the task has already started, or releasing a task that was not started yet.

Values:

enumerator setup
enumerator running
enumerator finished

Public Functions

inline stream_task(backend_ctx_untyped ctx_, exec_place e_place = exec_place::current_device())
stream_task(const stream_task<>&) = default
stream_task &operator=(const stream_task<>&) = default
~stream_task() = default
inline cudaStream_t get_stream()
inline cudaStream_t get_stream(size_t pos)
inline stream_task &set_stream(cudaStream_t s)
inline stream_task &start()
inline void set_current_place(pos4 p)
inline void unset_current_place()
inline const exec_place &get_current_place()
inline stream_task &end_uncleared()
inline stream_task &end()
template<typename Fun>
inline void operator->*(Fun &&fun)

Run lambda function on the specified device.

The lambda must accept exactly one argument. If the type of the lambda’s argument is one of stream_task<>, stream_task<>&, auto, auto&, or auto&&, then *this is passed to the lambda. Otherwise, this->get_stream() is passed to the lambda. Dependencies would need to be accessed separately.

Template Parameters

Fun – Type of lambda

Parameters

fun – Lambda function taking either a stream_task<> or a cudaStream_t as the only argument

inline void populate_deps_scheduling_info() const
inline bool schedule_task()

Use the scheduler to assign a device to this task.

Returns

returns true if the task’s time needs to be recorded

inline explicit operator bool() const
inline bool operator==(const task &rhs) const
inline const ::std::string &get_symbol() const

Get the string attached to the task for debugging purposes.

inline void set_symbol(::std::string new_symbol)

Attach a string to this task, which can be useful for debugging purposes, or in tracing tools.

inline void add_dep(task_dep_untyped d)

Add one dependency.

inline void add_deps(task_dep_vector_untyped input_deps)

Add a set of dependencies.

template<typename ...Pack>
inline void add_deps(task_dep_untyped first, Pack&&... pack)

Add a set of dependencies.

template<typename ...Args>
inline void add_deps(::std::tuple<Args...> &deps_tuple)

Add a tuple of dependencies.

inline const task_dep_vector_untyped &get_task_deps() const

Get the dependencies of the task.

inline task &on(exec_place p)

Specify where the task should run.

inline const exec_place &get_exec_place() const

Get and set the execution place of the task.

inline exec_place &get_exec_place()
inline void set_exec_place(const exec_place &place)
inline const data_place &get_affine_data_place() const

Get and Set the affine data place of the task.

inline void set_affine_data_place(data_place affine_data_place)
inline dim4 grid_dims() const
inline const event_list &get_done_prereqs() const

Get the list of events which mean that the task was executed.

template<typename T>
inline void merge_event_list(T &&tail)

Add an event list to the list of events which mean that the task was executed.

inline instance_id_t find_data_instance_id(const logical_data_untyped &d) const

Get the identifier of a data instance used by a task.

We here find the instance id used by a given piece of data in a task. Note that this incurs a certain overhead because it searches through the list of logical data in the task.

template<typename T, typename logical_data_untyped = logical_data_untyped>
decltype(auto) get(size_t submitted_index) const

Generic method to retrieve the data instance associated to an index in a task.

If T is the exact type stored, this returns a reference to a valid data instance in the task. If T is constify<U>, where U is the type stored, this returns an rvalue of type T.

Calling this outside the start()/end() section will result in undefined behaviour.

Remark

One should not forget the “template” keyword when using this API with a task t T &res = t.template get<T>(index);

inline void set_input_events(event_list _input_events)
inline const event_list &get_input_events() const
inline int get_unique_id() const
inline int get_mapping_id() const
inline size_t hash() const
inline void add_post_submission_hook(::std::vector<::std::function<void()>> &hooks)
inline event_list acquire(backend_ctx_untyped &ctx)

Start a task.

Acquires necessary resources and dependencies for a task to run.

SUBMIT = acquire + release at the same time …

This function prepares a task for execution by setting up its execution context, sorting its dependencies to avoid deadlocks, and ensuring all necessary data dependencies are fulfilled. It handles both small and large tasks by checking the task size and adjusting its behavior accordingly. Dependencies are processed to mark data usage, allocate necessary resources, and update data instances for task execution. This function also handles the task’s transition from the setup phase to the running phase.

Note

The function EXPECTs the task to be in the setup phase and the execution place not to be exec_place::device_auto.

Note

Dependencies are sorted by logical data addresses to prevent deadlocks.

Note

For tasks with multiple dependencies on the same logical data, only one instance of the data is used, and its access mode is determined by combining the access modes of all dependencies on that data.

Parameters
  • ctx – The backend context in which the task is executed. This context contains the execution stack and other execution-related information.

  • tsk – The task to be prepared for execution. The task must be in the setup phase before calling this function.

Returns

An event_list containing all the input events and any additional events generated during the acquisition of dependencies. This list represents the prerequisites for the task to start execution.

inline void release(backend_ctx_untyped &ctx, event_list &done_prereqs)

Releases resources associated with a task and transitions it to the finished phase.

This function releases a task after it has completed its execution. It merges the list of prerequisites (events) that are marked as done, updates the dependencies for the task’s logical data, resets the execution context to its original configuration, and marks the task as finished.

After calling this function, the task is considered “over” and is transitioned from the running phase to the finished phase. All associated resources are unlocked and post-submission hooks (if any) are executed.

The function performs the following actions:

  • Merges the provided list of done_prereqs into the task’s list of prerequisites.

  • Updates logical data dependencies based on the access mode (read or write).

  • Ensures proper synchronization by setting reader/writer prerequisites on the logical data.

  • Updates internal structures to reflect that the task has become a new “leaf task.”

  • Resets the execution context (device, SM affinity, etc.) to its previous state.

  • Unlocks mutexes for the logical data that were locked during task execution.

  • Releases any references to logical data, preventing potential memory leaks.

  • Executes any post-submission hooks attached to the task.

The function also interacts with tracing and debugging tools, marking the task’s completion and declaring the task as a prerequisite for future tasks in the trace.

Note

After calling this function, the task is no longer in the running phase and cannot be modified.

Warning

The task must have completed all its work before calling this function. Failure to follow the task’s lifecycle correctly may lead to undefined behavior.

Parameters
  • ctx – The context of the backend, which manages the execution environment.

  • done_prereqs – A list of events that must be marked as complete before the task can be released.

Pre

The task must be in the running phase.

Pre

The task’s list of prerequisites (dependencies) must be empty at the time of calling.

inline phase get_task_phase() const
inline void clear()

Protected Attributes

backend_ctx_untyped ctx
::std::shared_ptr<impl> pimpl