Stream¶

class nvtripy.Stream[source]¶

Bases: object

Represents a CUDA stream that can be used to manage concurrent operations.

Note

Streams can only be used with compiled functions.

This class is a wrapper around the underlying stream object, allowing management of CUDA streams.

Example: Creating New Streams

stream_a = tp.Stream()
stream_b = tp.Stream()

Local Variables¶

>>> stream_a
<Stream(id=136919484290192)>

>>> stream_b
<Stream(id=136919484289952)>

Example: Using Streams With Compiled Functions

func = tp.compile(tp.relu, args=[tp.InputInfo((2, 2), dtype=tp.float32)])

# Run the compiled linear function on a custom stream:
stream = tp.Stream()
func.stream = stream

input = tp.ones((2, 2), dtype=tp.float32).eval()
output = func(input)

Local Variables¶

>>> stream
<Stream(id=136919484058784)>

>>> input
tensor(
    [[1, 1],
     [1, 1]], 
    dtype=float32, loc=gpu:0, shape=(2, 2))

>>> output
tensor(
    [[1, 1],
     [1, 1]], 
    dtype=float32, loc=gpu:0, shape=(2, 2))

synchronize() → None[source]¶

Synchronize the stream, blocking until all operations in this stream are complete.

Example: Using Synchronize For Benchmarking

import time

func = tp.compile(tp.relu, args=[tp.InputInfo((2, 2), dtype=tp.float32)])

input = tp.ones((2, 2), dtype=tp.float32).eval()

func.stream = tp.Stream()

num_iters = 10
start_time = time.perf_counter()
for _ in range(num_iters):
    _ = func(input)
func.stream.synchronize()
end_time = time.perf_counter()

time = (end_time - start_time) / num_iters
print(f"Execution took {time * 1000} ms")

Output¶

Execution took 0.10941484943032265 ms

property ptr: int¶

Returns a pointer to the underlying CUDA stream.

Returns:: A pointer to the underlying CUDA stream.

Example: Retrieving The Default Stream

stream = tp.Stream()
stream_ptr = stream.ptr

Local Variables¶

>>> stream
<Stream(id=136919484198336)>

nvtripy.default_stream(device: device = device(kind='gpu', index=0)) → Stream[source]¶

Provides access to the default Tripy CUDA stream for a given device. There is only one default stream instance per device.

Parameters:: device (device) – The device for which to get the default stream.
Returns:: The default stream for the specified device.
Raises:: TripyException – If the device is not of type ‘gpu’ or if the device index is not 0.
Return type:: Stream

Note

Calling default_stream() with the same device always returns the same Stream instance for that device.

Example: Retrieving The Default Stream

# Get the default stream for the current device.
default = tp.default_stream()

Local Variables¶

>>> default
<Stream(id=136919492455056)>