Stream

class nvtripy.Stream(priority: int = 0)[source]

Bases: object

Represents a CUDA stream that can be used to manage concurrent operations.

Note

Streams can only be used with compiled functions.

This class is a wrapper around the underlying stream object, allowing management of CUDA streams.

Parameters:

priority (int) – Assign priority for the new stream. Lower number signifies higher priority.

Example: Creating New Streams
1stream_a = tp.Stream()
2stream_b = tp.Stream()
Local Variables
>>> stream_a
<Stream(id=128823735525280)>

>>> stream_b
<Stream(id=128823735521776)>
Example: Using Streams With Compiled Functions
 1linear = tp.Linear(2, 3)
 2
 3compiled_linear = tp.compile(
 4    linear, args=[tp.InputInfo((2, 2), dtype=tp.float32)]
 5)
 6
 7# Run the compiled linear function on a custom stream:
 8stream = tp.Stream()
 9compiled_linear.stream = stream
10
11input = tp.ones((2, 2), dtype=tp.float32)
12output = compiled_linear(input)
Local Variables
>>> linear
Linear(
    weight: Parameter = (shape=[3, 2], dtype=float32),
    bias: Parameter = (shape=[3], dtype=float32),
)
>>> linear.state_dict()
{
    weight: tensor(
        [[0.0000, 1.0000],
         [2.0000, 3.0000],
         [4.0000, 5.0000]], 
        dtype=float32, loc=gpu:0, shape=(3, 2)),
    bias: tensor([0.0000, 1.0000, 2.0000], dtype=float32, loc=gpu:0, shape=(3,)),
}

>>> stream
<Stream(id=128823682610368)>

>>> input
tensor(
    [[1.0000, 1.0000],
     [1.0000, 1.0000]], 
    dtype=float32, loc=gpu:0, shape=(2, 2))

>>> output
tensor(
    [[1.0000, 6.0000, 11.0000],
     [1.0000, 6.0000, 11.0000]], 
    dtype=float32, loc=gpu:0, shape=(2, 3))
synchronize() None[source]

Synchronize the stream, blocking until all operations in this stream are complete.

Example: Using Synchronize For Benchmarking
 1import time
 2
 3linear = tp.Linear(2, 3)
 4compiled_linear = tp.compile(
 5    linear, args=[tp.InputInfo((2, 2), dtype=tp.float32)]
 6)
 7
 8input = tp.ones((2, 2), dtype=tp.float32)
 9
10compiled_linear.stream = tp.Stream()
11
12num_iters = 10
13start_time = time.perf_counter()
14for _ in range(num_iters):
15    _ = compiled_linear(input)
16compiled_linear.stream.synchronize()
17end_time = time.perf_counter()
18
19time = (end_time - start_time) / num_iters
20print(f"Execution took {time * 1000} ms")
Output
Execution took 0.29250449733808637 ms
Return type:

None

nvtripy.default_stream(device: device = device(kind='gpu', index=0)) Stream[source]

Provides access to the default CUDA stream for a given device. There is only one default stream instance per device.

Parameters:

device (device) – The device for which to get the default stream.

Returns:

The default stream for the specified device.

Raises:

TripyException – If the device is not of type ‘gpu’ or if the device index is not 0.

Return type:

Stream

Note

Calling default_stream() with the same device always returns the same Stream instance for that device.

Example: Retrieving The Default Stream
1# Get the default stream for the current device.
2default = tp.default_stream()
Local Variables
>>> default
<Stream(id=128823733053664)>