Highly scalable and optimized Transolver architecture
Transolver, a highlight at ICML 2024 and a leading CFD surrogate model, is now significantly enhanced within PhysicsNeMo. Originally integrated by co-author Huakun Luo, these updates empower AI-Physics developers to rapidly prototype, scale, and apply surrogate models to enterprise applications like automotive aerodynamics.
This novel approach adapts the Attention Mechanism to effectively learn mappings of physical states and their interrelationships, resulting in exceptional surrogate model capabilities. As illustrated by the original authors, Transolver's slice mechanism maps to physical states, while a scaled dot-product attention mechanism establishes relationships between these states (image source):
Fig 1: Transolver model architecture.
In PhysicsNeMo, you can use Transolver on 2D or 3D regular data, or on an
unstructured mesh. To use an unstructured mesh, construct the Transolver
with structured_shape=None
, and pass the spatial shape instead for 2D or 3D
data.
from physicsnemo.models.transolver import Transolver
model = Transolver(
functional_dim = 5, # How many features are in your input feature space?
out_dim = 1, # How many features should the model predict, per point?
embedding_dim = 3, # How many features are in the spatial embedding?
n_layers = 12, # How many layers will the model use?
n_hidden = 256, # What's the output size of hidden layers?
n_head = 1, # How many heads in the attention layers?
mlp_ratio = 4, # How much should MLP hidden layers expand the features size?
slice_num = 512, # How many "slices" should the model use in each layer?
structured_shape = None, # None for unstructured data
use_te = USE_TE, # Use transformer engine?
time_input = False, # Use time embeddings?
).cuda()
# Generate synthetic data:
feature_data = torch.randn(2, 128000, 5, device="cuda")
pos_embedding = torch.randn(2, 128000, 3, device="cuda")
# Run the model:
output = model(feature_data, pos_embedding)
# Output shape will be [2, 128000, 1]
Stabilizing Reduced Precision
The physics attention mechanism in transolver relies on a Linear layer to compute a score to map input points to slices. The slices are normalized over all the input points which can, when used to produce the slice tokens, overflow in half precision. In PhysicsNeMo, we've enhanced the computation of the weights and tokens to be more stable even in lower precision, opening the door to reduced precision training even with massive input sizes.
Accelerating Transolver with TransformerEngine
When it comes to transformer architectures and performance, it's advantageous
to use the optimized layers from NVIDIA's
TransformerEngine
package. In PhysicsNeMo, starting in the 25.08 release,
we've enabled TransformerEngine as the backbone of Transolver. Just set
use_te=True
in the model constructor to do so. For larger models and datasets,
you should see a performance improvement in inference and end-to-end training
latency.
Fig 2: Chart showing the improved training latency of Transolver in PhysicsNeMo - lower latency is better.
Fig 3: Chart showing the improved inference latency of Transolver in PhysicsNeMo - lower latency is better.
Enabling fp8 Computation
Unique to PhysicsNeMo is the ability to train and inference the Transolver model with fp8 precision, leveraging the hardward capabilities of Hopper and Blackwell architectures. Using transformer engine as the computational backend, Transolver can automatically autocast linear, normalization, and attention layers to fp8 computations.
Drivaer ML Surface Training
For large surface meshes, Transolver excels as a surrogate model for CFD simulations. In PhysicsNeMo, you can explore our training and inference reference recipe solving the surface components of DrivaerML with an Irregular Mesh Transolver. Future updates to the example may include extensions of Transolver to volumetric meshes as well as optimized, Domain Parallel training examples.
Takeaways
Transolver is one of the best state of the art surrogate models available for PDEs today. We welcome you to try the implementation in PhysicsNeMo, which out-of-the box provides better reduced precision stability, fp8 training, transformer engine support, and high-resolution training examples on mesh data.