Using the Compiler

Modules and functions can be compiled for better performance.

Important

There are restrictions on what can be compiled - see nvtripy.compile().

We’ll demonstrate using a GEGLU module:

 1class GEGLU(tp.Module):
 2    def __init__(self, in_dim, out_dim):
 3        self.proj = tp.Linear(in_dim, out_dim * 2)
 4        self.out_dim = out_dim
 5
 6    def __call__(self, x):
 7        proj = self.proj(x)
 8        x, gate = tp.split(proj, 2, proj.rank - 1)
 9        return x * tp.gelu(gate)
10
11
12layer = GEGLU(in_dim=2, out_dim=1)

Compiling

We must inform the compiler which parameters are runtime inputs and provide their shape/datatypes using nvtripy.InputInfo:

1# GEGLU has one parameter, which needs to be a runtime input:
2inp_info = tp.InputInfo(shape=(1, 2), dtype=tp.float32)
3fast_geglu = tp.compile(layer, args=[inp_info])

Note

Other parameters become compile-time constants and will be folded away.

The compiler returns an nvtripy.Executable, which behaves like a callable:

1inp = tp.ones((1, 2))
2out = fast_geglu(inp)
Local Variables
>>> inp
tensor(
    [[1.0000, 1.0000]],
    dtype=float32, loc=gpu:0, shape=(1, 2))

>>> out
tensor([[6.0000]], dtype=float32, loc=gpu:0, shape=(1, 1))

Dynamic Shapes

To enable dynamic shapes, we can specify a range for any given dimension:

1inp_info = tp.InputInfo(shape=((1, 2, 4), 2), dtype=tp.float32)
Local Variables
>>> inp_info
InputInfo(min=[1, 2], opt=[2, 2], max=[4, 2], dtype=float32)

((1, 2, 4), 2) means:

  • The 0th dimension should support sizes from 1 to 4, optimizing for 2.

  • The 1st dimension should support a fixed size of 2.

The executable will support inputs within this range of shapes:

1fast_geglu = tp.compile(layer, args=[inp_info])
2
3# Use the input created previously, of shape: (1, 2)
4out0 = fast_geglu(inp)
5
6# Now use an input with a different shape: (2, 2):
7inp1 = tp.Tensor([[1.0, 2.0], [2.0, 3.0]], dtype=tp.float32)
8out1 = fast_geglu(inp1)
Local Variables
>>> out0
tensor([[6.0000]], dtype=float32, loc=gpu:0, shape=(1, 1))

>>> inp1
tensor(
    [[1.0000, 2.0000],
     [2.0000, 3.0000]],
    dtype=float32, loc=gpu:0, shape=(2, 2))

>>> out1
tensor(
    [[18.0000],
     [42.0000]],
    dtype=float32, loc=gpu:0, shape=(2, 1))

Saving And Loading Executables

  • Serialize and save:

    1import os
    2
    3executable_file_path = os.path.join(out_dir, "executable.json")
    4fast_geglu.save(executable_file_path)
    
  • Load and run:

    1loaded_fast_geglu = tp.Executable.load(executable_file_path)
    2
    3out = loaded_fast_geglu(inp)
    
    Local Variables
    >>> out
    tensor([[6.0000]], dtype=float32, loc=gpu:0, shape=(1, 1))