Using the Compiler¶

Modules and functions can be compiled for better performance.

Important

There are restrictions on what can be compiled - see nvtripy.compile().

We’ll demonstrate using a GEGLU module:

class GEGLU(tp.Module):
    def __init__(self, in_dim, out_dim):
        self.proj = tp.Linear(in_dim, out_dim * 2)
        self.out_dim = out_dim

    def forward(self, x):
        proj = self.proj(x)
        x, gate = tp.split(proj, 2, proj.rank - 1)
        return x * tp.gelu(gate)


layer = GEGLU(in_dim=2, out_dim=1)

layer.load_state_dict(
    {"proj.weight": tp.ones((2, 2)), "proj.bias": tp.ones((2,))}
)

Compiling¶

We must inform the compiler which parameters are runtime inputs and provide their shape/datatypes using nvtripy.InputInfo:

# GEGLU has one parameter, which needs to be a runtime input:
inp_info = tp.InputInfo(shape=(1, 2), dtype=tp.float32)
fast_geglu = tp.compile(layer, args=[inp_info])

Note

Other parameters become compile-time constants and will be folded away.

The compiler returns an nvtripy.Executable, which behaves like a callable:

inp = tp.ones((1, 2)).eval()
out = fast_geglu(inp)

Local Variables

>>> inp
tensor(
    [[1, 1]],
    dtype=float32, loc=gpu:0, shape=(1, 2))

>>> out
tensor([[8.98785]], dtype=float32, loc=gpu:0, shape=(1, 1))

Dynamic Shapes¶

To enable dynamic shapes, we can specify a range for any given dimension:

inp_info = tp.InputInfo(shape=((1, 2, 4), 2), dtype=tp.float32)

Local Variables

>>> inp_info
InputInfo<Bounds(min=(1, 2), opt=(2, 2), max=(4, 2)), dimension names: {}, dtype: float32>

((1, 2, 4), 2) means:

The 0th dimension should support sizes from 1 to 4, optimizing for 2.
The 1st dimension should support a fixed size of 2.

The executable will support inputs within this range of shapes:

fast_geglu = tp.compile(layer, args=[inp_info])

# Use the input created previously, of shape: (1, 2)
out0 = fast_geglu(inp)

# Now use an input with a different shape: (2, 2):
inp1 = tp.Tensor([[1.0, 2.0], [2.0, 3.0]]).eval()
out1 = fast_geglu(inp1)

Local Variables

>>> out0
tensor([[8.98785]], dtype=float32, loc=gpu:0, shape=(1, 1))

>>> inp1
tensor(
    [[1, 2],
     [2, 3]],
    dtype=float32, loc=gpu:0, shape=(2, 2))

>>> out1
tensor(
    [[15.9995],
     [36]],
    dtype=float32, loc=gpu:0, shape=(2, 1))

Named Dynamic Dimensions¶

Dynamic dimensions can be named using nvtripy.NamedDimension.

Dimensions with the same name must be equal at runtime.
The compiler can exploit this equality to make better optimizations.

def add(a, b):
    return a + b


batch = tp.NamedDimension("batch", 1, 2, 4)

# The batch dimension is dynamic but is always equal at runtime for both inputs:
inp_info0 = tp.InputInfo(shape=(batch, 2), dtype=tp.float32)
inp_info1 = tp.InputInfo(shape=(batch, 2), dtype=tp.float32)

fast_add = tp.compile(add, args=[inp_info0, inp_info1])

Local Variables

>>> batch
NamedDimension<name: 'batch', bounds: (1, 2, 4)>

>>> inp_info0
InputInfo<Bounds(min=(1, 2), opt=(2, 2), max=(4, 2)), dimension names: {0: 'batch'}, dtype: float32>

>>> inp_info1
InputInfo<Bounds(min=(1, 2), opt=(2, 2), max=(4, 2)), dimension names: {0: 'batch'}, dtype: float32>

Saving And Loading Executables¶

Serialize and save:

import os

executable_file_path = os.path.join(out_dir, "executable.json")
fast_geglu.save(executable_file_path)

Load and run:

loaded_fast_geglu = tp.Executable.load(executable_file_path)

out = loaded_fast_geglu(inp)

Local Variables

>>> out
tensor([[8.98785]], dtype=float32, loc=gpu:0, shape=(1, 1))