Mesh Submodule#
The curator.mesh submodule provides pipeline components for reading,
transforming, and writing physicsnemo.mesh.Mesh objects.
Installation#
pip install physicsnemo-curator[mesh]
# or
uv sync --group mesh
This installs: physicsnemo, pyvista, pyarrow, torch, and their transitive dependencies.
Components#
VTKSource#
VTKSource reads VTK files from a
local directory and converts each to a
physicsnemo.mesh.Mesh using physicsnemo.mesh.io.from_pyvista().
The constructor takes a path string directly:
from physicsnemo_curator.domains.mesh.sources.vtk import VTKSource
# Local directory (discovers VTK files automatically)
source = VTKSource("./data/", manifold_dim=2)
# With a custom glob pattern
source = VTKSource("./data/", file_pattern="**", manifold_dim="auto")
# Cell centroid mode for CFD polyhedral meshes
source = VTKSource("./cfd/", point_source="cell_centroids")
For remote datasets (HuggingFace Hub), use purpose-built dataset sources
such as DrivAerMLSource:
from physicsnemo_curator.domains.mesh.sources.drivaerml import DrivAerMLSource
source = DrivAerMLSource(mesh_type="boundary")
Supported formats: .vtk, .vtp, .vtu, .vts, .vtm
Constructor parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
required |
Path to local directory containing VTK files |
|
|
|
Glob pattern for filtering files |
|
|
|
Target manifold dimension (0–3) |
|
|
|
|
|
|
|
Warn when data arrays are discarded |
Manifold dimensions:
Dim |
Topology |
Cell Shape |
Notes |
|---|---|---|---|
0 |
Point cloud |
None |
Vertices only, no connectivity |
1 |
Line mesh |
Edges (2 vertices) |
Extracted from mesh topology |
2 |
Surface mesh |
Triangles (3 vertices) |
Auto-triangulated if needed |
3 |
Volume mesh |
Tetrahedra (4 vertices) |
Auto-tetrahedralized if needed |
Point source modes:
"vertices"(default): Mesh vertices become points.point_datais preserved. Cell topology is determined bymanifold_dim."cell_centroids": Cell centroids become points.cell_datais mapped topoint_data. Onlymanifold_dim0 and 1 are valid. Avoids expensive tetrahedralization for large polyhedral CFD meshes.
Examples:
from physicsnemo_curator.domains.mesh.sources.vtk import VTKSource
# Local directory (auto-discovers VTK files)
source = VTKSource("./data/")
# Read as volume mesh
source = VTKSource("./volumes/", manifold_dim=3)
# Use cell centroids for CFD polyhedral meshes
source = VTKSource(
"./cfd/",
point_source="cell_centroids",
warn_on_lost_data=False,
)
# Custom glob pattern to select a subset of files
source = VTKSource("./data/", file_pattern="timestep_*")
For remote datasets from HuggingFace Hub, use the dedicated dataset sources:
from physicsnemo_curator.domains.mesh.sources.drivaerml import DrivAerMLSource
source = DrivAerMLSource(mesh_type="boundary")
MeanFilter#
MeanFilter computes the spatial mean
of every field in point_data and cell_data for each mesh, accumulates
the results in memory, and writes them to a Parquet file on flush().
The mesh is yielded unchanged (pass-through).
Parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
required |
Parquet file path for statistics |
Output columns:
Column |
Description |
|---|---|
|
Number of points in the mesh |
|
Number of cells in the mesh |
|
Mean of each point data field |
|
Mean of each cell data field |
Meshes with different field names are handled gracefully — missing columns
are filled with NULL in the Parquet output.
Example:
from physicsnemo_curator.domains.mesh.filters.mean import MeanFilter
filt = MeanFilter(output="stats.parquet")
# Use in a pipeline
pipeline = source.filter(filt).write(sink)
for i in range(len(pipeline)):
pipeline[i]
# Write accumulated statistics
filt.flush()
# Read results
import pyarrow.parquet as pq
table = pq.read_table("stats.parquet")
print(table.to_pandas())
MeshSink#
MeshSink saves
physicsnemo.mesh.Mesh objects using the native tensordict
memory-mapped format (Mesh.save()).
Parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
required |
Directory for output files |
|
|
|
Format string for output names (see below) |
Output naming: By default each mesh is saved to
{output_dir}/mesh_{index:04d}_{seq} where index is the source item
index and seq is the sequence number within that item (for sources that
yield multiple meshes).
A custom naming_template can be provided using Python format-string
syntax with the following placeholders:
Placeholder |
Description |
|---|---|
|
Source item index |
|
Sequence number within that item (for multi-mesh sources) |
|
Relative path of the source file (from the source directory) |
|
File stem (name without extension) of the source file |
Standard format specs are supported (e.g. {index:04d}). The template
is used literally — include any file extension you need.
Saved meshes can be loaded back with:
from physicsnemo.mesh import Mesh
mesh = Mesh.load("./output/mesh_0000_0")
Examples:
from physicsnemo_curator.domains.mesh.sinks.mesh_writer import MeshSink
# Default naming
sink = MeshSink(output_dir="./output/")
pipeline = source.filter(filt).write(sink)
paths = pipeline[0] # ['./output/mesh_0000_0']
# Custom naming for MeshReader compatibility
sink = MeshSink(
output_dir="./output/",
naming_template="boundary_{index}.vtp.pmsh",
)
paths = pipeline[0] # ['./output/boundary_0.vtp.pmsh']
Full Pipeline Example#
from physicsnemo_curator.domains.mesh.sources.vtk import VTKSource
from physicsnemo_curator.domains.mesh.sources.drivaerml import DrivAerMLSource
from physicsnemo_curator.domains.mesh.filters.mean import MeanFilter
from physicsnemo_curator.domains.mesh.sinks.mesh_writer import MeshSink
# Local data
pipeline = (
VTKSource("./cfd_results/", manifold_dim=2)
.filter(MeanFilter(output="stats.parquet"))
.write(MeshSink(output_dir="./output/"))
)
# Remote data from HuggingFace (DrivAerML dataset)
pipeline = (
DrivAerMLSource(mesh_type="boundary")
.filter(MeanFilter(output="stats.parquet"))
.write(MeshSink(output_dir="./output/"))
)
# Execute
for i in range(len(pipeline)):
paths = pipeline[i]
print(f"Item {i}: {paths}")
# Finalize statistics
pipeline.filters[0].flush()