Using the Compiler¶

Modules and functions can be compiled for better performance.

Important

There are restrictions on what can be compiled - see nvtripy.compile().

We’ll demonstrate using a GEGLU module:

class GEGLU(tp.Module):
    def __init__(self, in_dim, out_dim):
        self.proj = tp.Linear(in_dim, out_dim * 2)
        self.out_dim = out_dim

    def forward(self, x):
        proj = self.proj(x)
        x, gate = tp.split(proj, 2, proj.rank - 1)
        return x * tp.gelu(gate)


layer = GEGLU(in_dim=2, out_dim=1)

layer.load_state_dict(
    {"proj.weight": tp.ones((2, 2)), "proj.bias": tp.ones((2,))}
)

Compiling¶

We must inform the compiler which parameters are runtime inputs and provide their shape/datatypes using nvtripy.InputInfo:

# GEGLU has one parameter, which needs to be a runtime input:
inp_info = tp.InputInfo(shape=(1, 2), dtype=tp.float32)
fast_geglu = tp.compile(layer, args=[inp_info])

Note

Other parameters become compile-time constants and will be folded away.

The compiler returns an nvtripy.Executable, which behaves like a callable:

inp = tp.ones((1, 2))
out = fast_geglu(inp)

Local Variables

>>> inp
tensor(
    [[1, 1]],
    dtype=float32, loc=gpu:0, shape=(1, 2))

>>> out
tensor([[8.98785]], dtype=float32, loc=gpu:0, shape=(1, 1))

Dynamic Shapes¶

To enable dynamic shapes, we can specify a range for any given dimension:

inp_info = tp.InputInfo(shape=((1, 2, 4), 2), dtype=tp.float32)

Local Variables

>>> inp_info
InputInfo(min=[1, 2], opt=[2, 2], max=[4, 2], dtype=float32)

((1, 2, 4), 2) means:

The 0th dimension should support sizes from 1 to 4, optimizing for 2.
The 1st dimension should support a fixed size of 2.

The executable will support inputs within this range of shapes:

fast_geglu = tp.compile(layer, args=[inp_info])

# Use the input created previously, of shape: (1, 2)
out0 = fast_geglu(inp)

# Now use an input with a different shape: (2, 2):
inp1 = tp.Tensor([[1.0, 2.0], [2.0, 3.0]], dtype=tp.float32)
out1 = fast_geglu(inp1)

Local Variables

>>> out0
tensor([[8.98785]], dtype=float32, loc=gpu:0, shape=(1, 1))

>>> inp1
tensor(
    [[1, 2],
     [2, 3]],
    dtype=float32, loc=gpu:0, shape=(2, 2))

>>> out1
tensor(
    [[15.9995],
     [36]],
    dtype=float32, loc=gpu:0, shape=(2, 1))

Saving And Loading Executables¶

Serialize and save:

import os

executable_file_path = os.path.join(out_dir, "executable.json")
fast_geglu.save(executable_file_path)

Load and run:

loaded_fast_geglu = tp.Executable.load(executable_file_path)

out = loaded_fast_geglu(inp)

Local Variables

>>> out
tensor([[8.98785]], dtype=float32, loc=gpu:0, shape=(1, 1))