compile

nvtripy.compile(func: Callable, optimization_level: int = 3, *, args: Sequence[Any] = [], kwargs: Dict[str, Any] = {}) Executable[source]

Compiles a function into an executable that runs efficiently on the GPU.

This works by first calling the function with the provided arguments in order to trace its execution, and the compiling the resulting traced graph.

Parameters that should be runtime inputs in the compiled function should be provided as InputInfo arguments to this function instead of as Tensor s. Arguments of any other type will be treated as compile-time constants.

Parameters:
  • func (Callable) –

    The function or Module to optimize. The function must satisfy the following requirements:

    • Must be a pure function with no side effects.

      This means, for example, that you cannot use print or assert.

    • Must not accept variadic positional or keyword arguments.

    • Must return one or more Tensor s and no other types.

    The compiled function will have the following constraints:

    • Only Tensor parameters to the function can become runtime inputs.

      All other types of parameters, even collections of Tensor s (e.g. List[Tensor] or Dict[str, Tensor]), will be baked into the compiled function as constants.

  • optimization_level (int) – The optimization level to use when compiling. Higher optimization levels can lead to better runtime performance at the cost of longer compile times.

  • args (Sequence[Any]) – Positional arguments to forward to the target function while tracing.

  • kwargs (Dict[str, Any]) – Keyword arguments to forward to the target function while tracing.

Returns:

The compiled executable. This executable’s parameters will be the subset of the original function’s parameters for which InputInfo s were provided to compile() and will only accept Tensor arguments.

Return type:

Executable

Example: Dynamic Shapes
 1def add(a, b):
 2    return a + b
 3
 4
 5# Support shapes in the range of (1, 2) to (3, 2), optimizing for a
 6# shape of (2, 2)
 7compiled_add = tp.compile(
 8    add,
 9    args=[
10        tp.InputInfo(shape=((1, 2, 3), 2), dtype=tp.float32),
11        tp.InputInfo(shape=((1, 2, 3), 2), dtype=tp.float32),
12    ],
13)
14
15small_a = tp.ones((1, 2), dtype=tp.float32)
16small_b = tp.ones((1, 2), dtype=tp.float32)
17
18small_out = compiled_add(small_a, small_b)
19
20# Now we can reuse the compiled function for any shapes within the
21# range:
22big_a = tp.ones((3, 2), dtype=tp.float32)
23big_b = tp.ones((3, 2), dtype=tp.float32)
24
25big_out = compiled_add(big_a, big_b)
Local Variables
>>> small_a
tensor(
    [[1.0000, 1.0000]], 
    dtype=float32, loc=gpu:0, shape=(1, 2))

>>> small_b
tensor(
    [[1.0000, 1.0000]], 
    dtype=float32, loc=gpu:0, shape=(1, 2))

>>> small_out
tensor(
    [[2.0000, 2.0000]], 
    dtype=float32, loc=gpu:0, shape=(1, 2))

>>> big_a
tensor(
    [[1.0000, 1.0000],
     [1.0000, 1.0000],
     [1.0000, 1.0000]], 
    dtype=float32, loc=gpu:0, shape=(3, 2))

>>> big_b
tensor(
    [[1.0000, 1.0000],
     [1.0000, 1.0000],
     [1.0000, 1.0000]], 
    dtype=float32, loc=gpu:0, shape=(3, 2))

>>> big_out
tensor(
    [[2.0000, 2.0000],
     [2.0000, 2.0000],
     [2.0000, 2.0000]], 
    dtype=float32, loc=gpu:0, shape=(3, 2))
Example: Baking Constants
 1def add(a, b):
 2    return a + b
 3
 4
 5# By using a non-InputInfo type (in this case, a Tensor) for the `b`
 6# argument to `compile`, we are indicating that it is a compile-time
 7# constant. Consequently, the compiled function will not accept `b`
 8# as an input.
 9b = tp.ones((1,), dtype=tp.float32)
10compiled_add = tp.compile(add, args=[tp.InputInfo((1,), dtype=tp.float32), b])
11
12a = tp.ones((1,), dtype=tp.float32)
13
14# Note that we cannot provide `b` as an argument to the compiled function.
15out = compiled_add(a)
Local Variables
>>> b
tensor([1.0000], dtype=float32, loc=gpu:0, shape=(1,))

>>> a
tensor([1.0000], dtype=float32, loc=gpu:0, shape=(1,))

>>> out
tensor([2.0000], dtype=float32, loc=gpu:0, shape=(1,))