Using the Compiler¶
Modules and functions can be compiled for better performance.
Important
There are restrictions on what can be compiled - see nvtripy.compile().
We’ll demonstrate using a GEGLU module:
1class GEGLU(tp.Module):
2 def __init__(self, in_dim, out_dim):
3 self.proj = tp.Linear(in_dim, out_dim * 2)
4 self.out_dim = out_dim
5
6 def forward(self, x):
7 proj = self.proj(x)
8 x, gate = tp.split(proj, 2, proj.rank - 1)
9 return x * tp.gelu(gate)
10
11
12layer = GEGLU(in_dim=2, out_dim=1)
13
14layer.load_state_dict(
15 {"proj.weight": tp.ones((2, 2)), "proj.bias": tp.ones((2,))}
16)
Compiling¶
We must inform the compiler which parameters are runtime inputs
and provide their shape/datatypes using nvtripy.InputInfo:
1# GEGLU has one parameter, which needs to be a runtime input:
2inp_info = tp.InputInfo(shape=(1, 2), dtype=tp.float32)
3fast_geglu = tp.compile(layer, args=[inp_info])
Note
Other parameters become compile-time constants and will be folded away.
The compiler returns an nvtripy.Executable, which behaves like a callable:
1inp = tp.ones((1, 2)).eval()
2out = fast_geglu(inp)
Local Variables
>>> inp
tensor(
[[1, 1]],
dtype=float32, loc=gpu:0, shape=(1, 2))
>>> out
tensor([[8.98785]], dtype=float32, loc=gpu:0, shape=(1, 1))
Dynamic Shapes¶
To enable dynamic shapes, we can specify a range for any given dimension:
1inp_info = tp.InputInfo(shape=((1, 2, 4), 2), dtype=tp.float32)
Local Variables
>>> inp_info
InputInfo<Bounds(min=(1, 2), opt=(2, 2), max=(4, 2)), dimension names: {}, dtype: float32>
((1, 2, 4), 2) means:
The 0th dimension should support sizes from
1to4, optimizing for2.The 1st dimension should support a fixed size of
2.
The executable will support inputs within this range of shapes:
1fast_geglu = tp.compile(layer, args=[inp_info])
2
3# Use the input created previously, of shape: (1, 2)
4out0 = fast_geglu(inp)
5
6# Now use an input with a different shape: (2, 2):
7inp1 = tp.Tensor([[1.0, 2.0], [2.0, 3.0]]).eval()
8out1 = fast_geglu(inp1)
Local Variables
>>> out0
tensor([[8.98785]], dtype=float32, loc=gpu:0, shape=(1, 1))
>>> inp1
tensor(
[[1, 2],
[2, 3]],
dtype=float32, loc=gpu:0, shape=(2, 2))
>>> out1
tensor(
[[15.9995],
[36]],
dtype=float32, loc=gpu:0, shape=(2, 1))
Named Dynamic Dimensions¶
Dynamic dimensions can be named using nvtripy.NamedDimension.
Dimensions with the same name must be equal at runtime.
The compiler can exploit this equality to make better optimizations.
1def add(a, b):
2 return a + b
3
4
5batch = tp.NamedDimension("batch", 1, 2, 4)
6
7# The batch dimension is dynamic but is always equal at runtime for both inputs:
8inp_info0 = tp.InputInfo(shape=(batch, 2), dtype=tp.float32)
9inp_info1 = tp.InputInfo(shape=(batch, 2), dtype=tp.float32)
10
11fast_add = tp.compile(add, args=[inp_info0, inp_info1])
Local Variables
>>> batch
NamedDimension<name: 'batch', bounds: (1, 2, 4)>
>>> inp_info0
InputInfo<Bounds(min=(1, 2), opt=(2, 2), max=(4, 2)), dimension names: {0: 'batch'}, dtype: float32>
>>> inp_info1
InputInfo<Bounds(min=(1, 2), opt=(2, 2), max=(4, 2)), dimension names: {0: 'batch'}, dtype: float32>
Saving And Loading Executables¶
Serialize and save:
1import os 2 3executable_file_path = os.path.join(out_dir, "executable.json") 4fast_geglu.save(executable_file_path)
Load and run:
1loaded_fast_geglu = tp.Executable.load(executable_file_path) 2 3out = loaded_fast_geglu(inp)
Local Variables
>>> out tensor([[8.98785]], dtype=float32, loc=gpu:0, shape=(1, 1))