Basics#

This section walks through the main steps needed to define and train a neural network model with Warp-NN: defining a neural network, defining a loss function, running the forward pass and recording the kernel launches, and calling the optimizer to update the model parameters.

Defining a neural network#

Import the necessary modules.

import warp as wp
from warp_nn import nn
from warp_nn import optimizers

Subclass Module and assign built-in layers as attributes. Implement the __call__() method to define the forward pass of the model.

Important

Call __post_init__() at the end of __init__ so that sub-modules and their parameters are registered automatically.

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(128, 64)
        self.act = nn.ReLU()
        self.fc2 = nn.Linear(64, 10)
        super().__post_init__()  # must be called last

    def __call__(self, x):
        return self.fc2(self.act(self.fc1(x)))

model = MLP()
model.to("cuda")

Defining a loss function#

Loss functions are plain Warp kernels that compute the value with respect to which differentiation will be carried out.

For example, to compute the Mean Squared Error (MSE) loss with "sum" reduction, accumulate the scalar loss into a 1-element array with support for gradient accumulation (requires_grad=True).

@wp.kernel
def mse_loss(
    prediction: wp.array2d[float],
    target: wp.array2d[float],
    loss: wp.array1d[float],
):
    i, j = wp.tid()
    diff = prediction[i, j] - target[i, j]
    wp.atomic_add(loss, 0, diff * diff)

loss = wp.zeros((1,), dtype=wp.float32, requires_grad=True, device="cuda")

Differentiation and optimization#

Instantiate the optimizer given the model’s parameters.

optimizer = optimizers.Adam(model.parameters(), lr=1e-3, device="cuda")

Run the learning process iteratively across the dataset for the specified number of epochs.

Important

Input arrays require the gradient to be enabled (requires_grad=True).

for epoch in range(epochs):
    for input, target in dataset:

Wrap the forward pass and the loss kernel launch inside a Tape context. Every kernel launched inside the context block is recorded so that backward() can propagate gradients back through the model.

        loss.zero_()  # reset loss array before each loss computation
        with wp.Tape() as tape:
            prediction = model(input)  # model's forward pass
            wp.launch(  # loss computation
                mse_loss,
                dim=prediction.shape,
                inputs=[prediction, target],
                outputs=[loss],
                device="cuda",
            )

After each backward() pass, call the optimizer’s step() method to apply the gradient update. Call the tape’s zero() method afterwards to clear the recorded operations and reset gradients before the next training iteration.

        tape.backward(loss)  # compute gradients
        optimizer.step()
        tape.zero()  # reset tape and zero gradients