Using the Compiler¶
Modules and functions can be compiled for better performance.
Important
There are restrictions on what can be compiled - see nvtripy.compile()
.
We’ll demonstrate using a GEGLU module:
1class GEGLU(tp.Module):
2 def __init__(self, in_dim, out_dim):
3 self.proj = tp.Linear(in_dim, out_dim * 2)
4 self.out_dim = out_dim
5
6 def __call__(self, x):
7 proj = self.proj(x)
8 x, gate = tp.split(proj, 2, proj.rank - 1)
9 return x * tp.gelu(gate)
10
11
12layer = GEGLU(in_dim=2, out_dim=1)
Compiling¶
We must inform the compiler which parameters are runtime inputs
and provide their shape/datatypes using nvtripy.InputInfo
:
1# GEGLU has one parameter, which needs to be a runtime input:
2inp_info = tp.InputInfo(shape=(1, 2), dtype=tp.float32)
3fast_geglu = tp.compile(layer, args=[inp_info])
Note
Other parameters become compile-time constants and will be folded away.
The compiler returns an nvtripy.Executable
, which behaves like a callable:
1inp = tp.ones((1, 2))
2out = fast_geglu(inp)
Local Variables
>>> inp
tensor(
[[1.0000, 1.0000]],
dtype=float32, loc=gpu:0, shape=(1, 2))
>>> out
tensor([[6.0000]], dtype=float32, loc=gpu:0, shape=(1, 1))
Dynamic Shapes¶
To enable dynamic shapes, we can specify a range for any given dimension:
1inp_info = tp.InputInfo(shape=((1, 2, 4), 2), dtype=tp.float32)
Local Variables
>>> inp_info
InputInfo(min=[1, 2], opt=[2, 2], max=[4, 2], dtype=float32)
((1, 2, 4), 2)
means:
The 0th dimension should support sizes from
1
to4
, optimizing for2
.The 1st dimension should support a fixed size of
2
.
The executable will support inputs within this range of shapes:
1fast_geglu = tp.compile(layer, args=[inp_info])
2
3# Use the input created previously, of shape: (1, 2)
4out0 = fast_geglu(inp)
5
6# Now use an input with a different shape: (2, 2):
7inp1 = tp.Tensor([[1.0, 2.0], [2.0, 3.0]], dtype=tp.float32)
8out1 = fast_geglu(inp1)
Local Variables
>>> out0
tensor([[6.0000]], dtype=float32, loc=gpu:0, shape=(1, 1))
>>> inp1
tensor(
[[1.0000, 2.0000],
[2.0000, 3.0000]],
dtype=float32, loc=gpu:0, shape=(2, 2))
>>> out1
tensor(
[[18.0000],
[42.0000]],
dtype=float32, loc=gpu:0, shape=(2, 1))
Saving And Loading Executables¶
Serialize and save:
1import os 2 3executable_file_path = os.path.join(out_dir, "executable.json") 4fast_geglu.save(executable_file_path)
Load and run:
1loaded_fast_geglu = tp.Executable.load(executable_file_path) 2 3out = loaded_fast_geglu(inp)
Local Variables
>>> out tensor([[6.0000]], dtype=float32, loc=gpu:0, shape=(1, 1))