cuda.core.Stream#

class cuda.core.Stream(*args, **kwargs)#

Represent a queue of GPU operations that are executed in a specific order.

Applications use streams to control the order of execution for GPU work. Work within a single stream are executed sequentially. Whereas work across multiple streams can be further controlled using stream priorities and Event managements.

Advanced users can utilize default streams for enforce complex implicit synchronization behaviors.

Directly creating a Stream is not supported due to ambiguity. New streams should instead be created through a Device object, or created directly through using an existing handle using Stream.from_handle().

Methods

__init__(*args, **kwargs)#
close(self)#

Destroy the stream.

Releases the stream handle. For owned streams, this destroys the underlying CUDA stream. For borrowed streams, this releases the reference and allows the Python owner to be GC’d.

create_graph_builder(self) GraphBuilder#

Create a new GraphBuilder object.

The new graph builder will be associated with this stream.

Returns:

Newly created graph builder object.

Return type:

GraphBuilder

static from_handle(int handle: int) Stream#

Create a new Stream object from a foreign stream handle.

Uses a cudaStream_t pointer address represented as a Python int to create a new Stream object.

Note

Stream lifetime is not managed, foreign object must remain alive while this steam is active.

Parameters:

handle (int) – Stream handle representing the address of a foreign stream object.

Returns:

Newly created stream object.

Return type:

Stream

classmethod legacy_default(cls)#

Return the legacy default stream.

The legacy default stream is an implicit stream which synchronizes with all other streams in the same CUDA context except for non-blocking streams. When any operation is launched on the legacy default stream, it waits for all previously launched operations in blocking streams to complete, and all subsequent operations in blocking streams wait for the legacy default stream operation to complete.

This stream is useful for ensuring strict ordering of operations but may limit concurrency. For better performance in concurrent scenarios, consider using per_thread_default() or creating explicit streams.

This method returns the same singleton instance on every call for the base Stream class. Subclasses will receive new instances of the subclass type that wrap the same underlying CUDA stream.

Returns:

The legacy default stream singleton instance for the current context.

Return type:

Stream

See also

per_thread_default

Per-thread default stream alternative.

from_handle

Create stream from existing handle.

Examples

>>> from cuda.core import Stream
>>> stream1 = Stream.legacy_default()
>>> stream2 = Stream.legacy_default()
>>> stream1 is stream2  # True - returns same singleton
True
classmethod per_thread_default(cls)#

Return the per-thread default stream.

The per-thread default stream is local to both the calling thread and the CUDA context. Unlike the legacy default stream, it does not synchronize with other streams and behaves like an explicitly created non-blocking stream. This allows for better concurrency in multi-threaded applications.

Each thread has its own per-thread default stream, enabling true concurrent execution without implicit synchronization barriers.

This method returns the same singleton instance on every call for the base Stream class. Subclasses will receive new instances of the subclass type that wrap the same underlying CUDA stream.

Returns:

The per-thread default stream singleton instance for the current thread and context.

Return type:

Stream

See also

legacy_default

Legacy default stream alternative.

from_handle

Create stream from existing handle.

Examples

>>> from cuda.core import Stream
>>> stream1 = Stream.per_thread_default()
>>> stream2 = Stream.per_thread_default()
>>> stream1 is stream2  # True - returns same singleton
True
record(
self,
event: Event = None,
options: EventOptions = None,
) Event#

Record an event onto the stream.

Creates an Event object (or reuses the given one) by recording on the stream.

Parameters:
  • event (Event, optional) – Optional event object to be reused for recording.

  • options (EventOptions, optional) – Customizable dataclass for event creation options.

Returns:

Newly created event object.

Return type:

Event

sync(self)#

Synchronize the stream.

wait(self, event_or_stream: Event | Stream)#

Wait for a CUDA event or a CUDA stream.

Waiting for an event or a stream establishes a stream order.

If a Stream is provided, then wait until the stream’s work is completed. This is done by recording a new Event on the stream and then waiting on it.

Attributes

context#

Context

Return the Context associated with this stream.

Type:

Stream.context

device#

Device

Return the Device singleton associated with this stream.

Note

The current context on the device may differ from this stream’s context. This case occurs when a different CUDA context is set current after a stream is created.

Type:

Stream.device

handle#

cuda.bindings.driver.CUstream

Return the underlying CUstream object.

Caution

This handle is a Python object. To get the memory address of the underlying C handle, call int(Stream.handle).

Type:

Stream.handle

is_nonblocking#

bool

Return True if this is a nonblocking stream, otherwise False.

Type:

Stream.is_nonblocking

priority#

int

Return the stream priority.

Type:

Stream.priority