Stream¶
- class nvtripy.Stream(priority: int = 0)[source]¶
Bases:
object
Represents a CUDA stream that can be used to manage concurrent operations.
Note
Streams can only be used with compiled functions.
This class is a wrapper around the underlying stream object, allowing management of CUDA streams.
- Parameters:
priority (int) – Assign priority for the new stream. Lower number signifies higher priority.
Example: Creating New Streams
1stream_a = tp.Stream() 2stream_b = tp.Stream()
>>> stream_a <Stream(id=128823735525280)> >>> stream_b <Stream(id=128823735521776)>
Example: Using Streams With Compiled Functions
1linear = tp.Linear(2, 3) 2 3compiled_linear = tp.compile( 4 linear, args=[tp.InputInfo((2, 2), dtype=tp.float32)] 5) 6 7# Run the compiled linear function on a custom stream: 8stream = tp.Stream() 9compiled_linear.stream = stream 10 11input = tp.ones((2, 2), dtype=tp.float32) 12output = compiled_linear(input)
>>> linear Linear( weight: Parameter = (shape=[3, 2], dtype=float32), bias: Parameter = (shape=[3], dtype=float32), ) >>> linear.state_dict() { weight: tensor( [[0.0000, 1.0000], [2.0000, 3.0000], [4.0000, 5.0000]], dtype=float32, loc=gpu:0, shape=(3, 2)), bias: tensor([0.0000, 1.0000, 2.0000], dtype=float32, loc=gpu:0, shape=(3,)), } >>> stream <Stream(id=128823682610368)> >>> input tensor( [[1.0000, 1.0000], [1.0000, 1.0000]], dtype=float32, loc=gpu:0, shape=(2, 2)) >>> output tensor( [[1.0000, 6.0000, 11.0000], [1.0000, 6.0000, 11.0000]], dtype=float32, loc=gpu:0, shape=(2, 3))
- synchronize() None [source]¶
Synchronize the stream, blocking until all operations in this stream are complete.
Example: Using Synchronize For Benchmarking
1import time 2 3linear = tp.Linear(2, 3) 4compiled_linear = tp.compile( 5 linear, args=[tp.InputInfo((2, 2), dtype=tp.float32)] 6) 7 8input = tp.ones((2, 2), dtype=tp.float32) 9 10compiled_linear.stream = tp.Stream() 11 12num_iters = 10 13start_time = time.perf_counter() 14for _ in range(num_iters): 15 _ = compiled_linear(input) 16compiled_linear.stream.synchronize() 17end_time = time.perf_counter() 18 19time = (end_time - start_time) / num_iters 20print(f"Execution took {time * 1000} ms")
Execution took 0.29250449733808637 ms
- Return type:
None
- nvtripy.default_stream(device: device = device(kind='gpu', index=0)) Stream [source]¶
Provides access to the default CUDA stream for a given device. There is only one default stream instance per device.
- Parameters:
device (device) – The device for which to get the default stream.
- Returns:
The default stream for the specified device.
- Raises:
TripyException – If the device is not of type ‘gpu’ or if the device index is not 0.
- Return type:
Note
Calling
default_stream()
with the same device always returns the sameStream
instance for that device.Example: Retrieving The Default Stream
1# Get the default stream for the current device. 2default = tp.default_stream()
>>> default <Stream(id=128823733053664)>