Basics#
This section walks through the main steps needed to define and train a neural network model with Warp-NN: defining a neural network, defining a loss function, running the forward pass and recording the kernel launches, and calling the optimizer to update the model parameters.
See also
Visit the Examples section for working examples.
Defining a neural network#
Import the necessary modules.
import warp as wp
from warp_nn import nn
from warp_nn import optimizers
Subclass Module and assign built-in layers as attributes.
Implement the __call__() method to define the forward pass of the model.
Important
Call __post_init__() at the end of __init__ so that sub-modules
and their parameters are registered automatically.
class MLP(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(128, 64)
self.act = nn.ReLU()
self.fc2 = nn.Linear(64, 10)
super().__post_init__() # must be called last
def __call__(self, x):
return self.fc2(self.act(self.fc1(x)))
model = MLP()
model.to("cuda")
Defining a loss function#
Loss functions are plain Warp kernels that compute the value with respect to which differentiation will be carried out.
For example, to compute the Mean Squared Error (MSE) loss with "sum" reduction, accumulate the scalar loss into
a 1-element array with support for gradient accumulation (requires_grad=True).
@wp.kernel
def mse_loss(
prediction: wp.array2d[float],
target: wp.array2d[float],
loss: wp.array1d[float],
):
i, j = wp.tid()
diff = prediction[i, j] - target[i, j]
wp.atomic_add(loss, 0, diff * diff)
loss = wp.zeros((1,), dtype=wp.float32, requires_grad=True, device="cuda")
Differentiation and optimization#
Instantiate the optimizer given the model’s parameters.
optimizer = optimizers.Adam(model.parameters(), lr=1e-3, device="cuda")
Run the learning process iteratively across the dataset for the specified number of epochs.
Important
Input arrays require the gradient to be enabled (requires_grad=True).
for epoch in range(epochs):
for input, target in dataset:
Wrap the forward pass and the loss kernel launch inside a Tape context.
Every kernel launched inside the context block is recorded so that backward()
can propagate gradients back through the model.
loss.zero_() # reset loss array before each loss computation
with wp.Tape() as tape:
prediction = model(input) # model's forward pass
wp.launch( # loss computation
mse_loss,
dim=prediction.shape,
inputs=[prediction, target],
outputs=[loss],
device="cuda",
)
After each backward() pass, call the optimizer’s step()
method to apply the gradient update. Call the tape’s zero() method afterwards to clear
the recorded operations and reset gradients before the next training iteration.
tape.backward(loss) # compute gradients
optimizer.step()
tape.zero() # reset tape and zero gradients