Highly scalable and optimized Transolver architecture

Transolver, a highlight at ICML 2024 and a leading CFD surrogate model, is now significantly enhanced within PhysicsNeMo. Originally integrated by co-author Huakun Luo, these updates empower AI-Physics developers to rapidly prototype, scale, and apply surrogate models to enterprise applications like automotive aerodynamics.

This novel approach adapts the Attention Mechanism to effectively learn mappings of physical states and their interrelationships, resulting in exceptional surrogate model capabilities. As illustrated by the original authors, Transolver's slice mechanism maps to physical states, while a scaled dot-product attention mechanism establishes relationships between these states (image source):

Transolver PhysicsAttention Fig 1: Transolver model architecture.

In PhysicsNeMo, you can use Transolver on 2D or 3D regular data, or on an unstructured mesh. To use an unstructured mesh, construct the Transolver with structured_shape=None, and pass the spatial shape instead for 2D or 3D data.

from physicsnemo.models.transolver import Transolver

model = Transolver(
    functional_dim = 5, # How many features are in your input feature space?
    out_dim = 1, # How many features should the model predict, per point?
    embedding_dim = 3, # How many features are in the spatial embedding?
    n_layers = 12, # How many layers will the model use?
    n_hidden = 256, # What's the output size of hidden layers?
    n_head = 1, # How many heads in the attention layers?
    mlp_ratio = 4, # How much should MLP hidden layers expand the features size?
    slice_num = 512, # How many "slices" should the model use in each layer?
    structured_shape = None, # None for unstructured data
    use_te = USE_TE, # Use transformer engine?
    time_input = False, # Use time embeddings?
).cuda()

# Generate synthetic data:
feature_data = torch.randn(2, 128000, 5, device="cuda")
pos_embedding = torch.randn(2, 128000, 3, device="cuda")

# Run the model:
output = model(feature_data, pos_embedding)

# Output shape will be [2, 128000, 1]

Stabilizing Reduced Precision

The physics attention mechanism in transolver relies on a Linear layer to compute a score to map input points to slices. The slices are normalized over all the input points which can, when used to produce the slice tokens, overflow in half precision. In PhysicsNeMo, we've enhanced the computation of the weights and tokens to be more stable even in lower precision, opening the door to reduced precision training even with massive input sizes.

Accelerating Transolver with TransformerEngine

When it comes to transformer architectures and performance, it's advantageous to use the optimized layers from NVIDIA's TransformerEngine package. In PhysicsNeMo, starting in the 25.08 release, we've enabled TransformerEngine as the backbone of Transolver. Just set use_te=True in the model constructor to do so. For larger models and datasets, you should see a performance improvement in inference and end-to-end training latency.

Transolver with or without TransformerEngine Fig 2: Chart showing the improved training latency of Transolver in PhysicsNeMo - lower latency is better.

Transolver with or without TransformerEngine Fig 3: Chart showing the improved inference latency of Transolver in PhysicsNeMo - lower latency is better.

Enabling fp8 Computation

Unique to PhysicsNeMo is the ability to train and inference the Transolver model with fp8 precision, leveraging the hardward capabilities of Hopper and Blackwell architectures. Using transformer engine as the computational backend, Transolver can automatically autocast linear, normalization, and attention layers to fp8 computations.

Drivaer ML Surface Training

For large surface meshes, Transolver excels as a surrogate model for CFD simulations. In PhysicsNeMo, you can explore our training and inference reference recipe solving the surface components of DrivaerML with an Irregular Mesh Transolver. Future updates to the example may include extensions of Transolver to volumetric meshes as well as optimized, Domain Parallel training examples.

Takeaways

Transolver is one of the best state of the art surrogate models available for PDEs today. We welcome you to try the implementation in PhysicsNeMo, which out-of-the box provides better reduced precision stability, fp8 training, transformer engine support, and high-resolution training examples on mesh data.

Share on Share on Share on LinkedIn