compile¶
- nvtripy.compile(func: Callable, optimization_level: int = 3, *, args: Sequence[Any] = [], kwargs: Dict[str, Any] = {}) Executable [source]¶
Compiles a function into an executable that runs efficiently on the GPU.
This works by first calling the function with the provided arguments in order to trace its execution, and the compiling the resulting traced graph.
Parameters that should be runtime inputs in the compiled function should be provided as
InputInfo
arguments to this function instead of asTensor
s. Arguments of any other type will be treated as compile-time constants.- Parameters:
func (Callable) –
The function or
Module
to optimize. The function must satisfy the following requirements:- Must be a pure function with no side effects.
This means, for example, that you cannot use
print
orassert
.
Must not accept variadic positional or keyword arguments.
Must return one or more
Tensor
s and no other types.
The compiled function will have the following constraints:
optimization_level (int) – The optimization level to use when compiling. Higher optimization levels can lead to better runtime performance at the cost of longer compile times.
args (Sequence[Any]) – Positional arguments to forward to the target function while tracing.
kwargs (Dict[str, Any]) – Keyword arguments to forward to the target function while tracing.
- Returns:
The compiled executable. This executable’s parameters will be the subset of the original function’s parameters for which
InputInfo
s were provided tocompile()
and will only acceptTensor
arguments.- Return type:
Example: Dynamic Shapes
1def add(a, b): 2 return a + b 3 4 5# Support shapes in the range of (1, 2) to (3, 2), optimizing for a 6# shape of (2, 2) 7compiled_add = tp.compile( 8 add, 9 args=[ 10 tp.InputInfo(shape=((1, 2, 3), 2), dtype=tp.float32), 11 tp.InputInfo(shape=((1, 2, 3), 2), dtype=tp.float32), 12 ], 13) 14 15small_a = tp.ones((1, 2), dtype=tp.float32) 16small_b = tp.ones((1, 2), dtype=tp.float32) 17 18small_out = compiled_add(small_a, small_b) 19 20# Now we can reuse the compiled function for any shapes within the 21# range: 22big_a = tp.ones((3, 2), dtype=tp.float32) 23big_b = tp.ones((3, 2), dtype=tp.float32) 24 25big_out = compiled_add(big_a, big_b)
>>> small_a tensor( [[1.0000, 1.0000]], dtype=float32, loc=gpu:0, shape=(1, 2)) >>> small_b tensor( [[1.0000, 1.0000]], dtype=float32, loc=gpu:0, shape=(1, 2)) >>> small_out tensor( [[2.0000, 2.0000]], dtype=float32, loc=gpu:0, shape=(1, 2)) >>> big_a tensor( [[1.0000, 1.0000], [1.0000, 1.0000], [1.0000, 1.0000]], dtype=float32, loc=gpu:0, shape=(3, 2)) >>> big_b tensor( [[1.0000, 1.0000], [1.0000, 1.0000], [1.0000, 1.0000]], dtype=float32, loc=gpu:0, shape=(3, 2)) >>> big_out tensor( [[2.0000, 2.0000], [2.0000, 2.0000], [2.0000, 2.0000]], dtype=float32, loc=gpu:0, shape=(3, 2))
Example: Baking Constants
1def add(a, b): 2 return a + b 3 4 5# By using a non-InputInfo type (in this case, a Tensor) for the `b` 6# argument to `compile`, we are indicating that it is a compile-time 7# constant. Consequently, the compiled function will not accept `b` 8# as an input. 9b = tp.ones((1,), dtype=tp.float32) 10compiled_add = tp.compile(add, args=[tp.InputInfo((1,), dtype=tp.float32), b]) 11 12a = tp.ones((1,), dtype=tp.float32) 13 14# Note that we cannot provide `b` as an argument to the compiled function. 15out = compiled_add(a)
>>> b tensor([1.0000], dtype=float32, loc=gpu:0, shape=(1,)) >>> a tensor([1.0000], dtype=float32, loc=gpu:0, shape=(1,)) >>> out tensor([2.0000], dtype=float32, loc=gpu:0, shape=(1,))