Mesh Submodule#

The curator.mesh submodule provides pipeline components for reading, transforming, and writing physicsnemo.mesh.Mesh objects.

Installation#

pip install physicsnemo-curator[mesh]
# or
uv sync --group mesh

This installs: physicsnemo, pyvista, pyarrow, torch, and their transitive dependencies.

Components#

VTKSource#

VTKSource reads VTK files from a local directory and converts each to a physicsnemo.mesh.Mesh using physicsnemo.mesh.io.from_pyvista().

The constructor takes a path string directly:

from physicsnemo_curator.domains.mesh.sources.vtk import VTKSource

# Local directory (discovers VTK files automatically)
source = VTKSource("./data/", manifold_dim=2)

# With a custom glob pattern
source = VTKSource("./data/", file_pattern="**", manifold_dim="auto")

# Cell centroid mode for CFD polyhedral meshes
source = VTKSource("./cfd/", point_source="cell_centroids")

For remote datasets (HuggingFace Hub), use purpose-built dataset sources such as DrivAerMLSource:

from physicsnemo_curator.domains.mesh.sources.drivaerml import DrivAerMLSource

source = DrivAerMLSource(mesh_type="boundary")

Supported formats: .vtk, .vtp, .vtu, .vts, .vtm, .stl

Constructor parameters:

Parameter

Type

Default

Description

input_path

str

required

Path to local directory containing VTK files

file_pattern

str

"**"

Glob pattern for filtering files

manifold_dim

int | "auto" | list[dict]

"auto"

Target manifold dimension (0–3), or per-path rule list

point_source

str | list[dict]

"vertices"

"vertices" or "cell_centroids" (or per-path rule list)

warn_on_lost_data

bool

True

Warn when data arrays are discarded

backend

str

"pyvista"

Reading backend: "pyvista" or "rust" (faster VTU/VTP)

key_filters

list[dict] | None

None

Per-path data-array include/exclude rules (reader-level)

volume_pattern

str | None

None

Filename glob for volume files (domain-mesh mode)

boundary_pattern

str | None

None

Filename glob for boundary files (domain-mesh mode)

boundary_name

str

"vehicle"

Boundary key for the paired DomainMesh boundary

boundary_generator

object | None

None

Optional BC generator applied to each paired DomainMesh

Reading backends:

  • "pyvista" (default): full-featured reading via from_pyvista.

  • "rust": native reader for .vtu / .vtp (much faster I/O), with a transparent fallback to PyVista for unsupported files/configs. Builds the Mesh (points, cells, point/cell data, or cell centroids) directly from raw arrays.

Per-path conversion and array filtering:

manifold_dim and point_source accept either a scalar (applied to every file) or a list of {"pattern": glob, "value": ...} rules selected per file (longest matching pattern wins). key_filters drops/keeps named data arrays at the reader level (so filtered fields are never materialised — critical for very large volume .vtu files):

source = VTKSource(
    "./dataset/",
    backend="rust",
    manifold_dim=[
        {"pattern": "**/volume_*", "value": 0},     # volumes -> point cloud
        {"pattern": "**/boundary_*", "value": 2},   # surfaces -> triangulated
    ],
    point_source=[{"pattern": "**/volume_*", "value": "cell_centroids"}],
    key_filters=[
        {"path_pattern": "**/volume_*.vtu", "mode": "exclude", "keys": ["NodeID"]},
    ],
)

Domain-mesh mode (volume + boundary -> DomainMesh):

When both volume_pattern and boundary_pattern are set, files are paired by parent directory into a physicsnemo.mesh.domain_mesh.DomainMesh per index (one volume interior + one boundary). Unpaired files (e.g. STLs) fall back to standalone Mesh:

source = VTKSource(
    "./dataset/",
    volume_pattern="volume_*.vtu",
    boundary_pattern="boundary_*.vtp",
    boundary_name="vehicle",
)
domain = next(source[0])  # DomainMesh(interior=..., boundaries={"vehicle": ...})

Manifold dimensions:

Dim

Topology

Cell Shape

Notes

0

Point cloud

None

Vertices only, no connectivity

1

Line mesh

Edges (2 vertices)

Extracted from mesh topology

2

Surface mesh

Triangles (3 vertices)

Auto-triangulated if needed

3

Volume mesh

Tetrahedra (4 vertices)

Auto-tetrahedralized if needed

Point source modes:

  • "vertices" (default): Mesh vertices become points. point_data is preserved. Cell topology is determined by manifold_dim.

  • "cell_centroids": Cell centroids become points. cell_data is mapped to point_data. Only manifold_dim 0 and 1 are valid. Avoids expensive tetrahedralization for large polyhedral CFD meshes.

Examples:

from physicsnemo_curator.domains.mesh.sources.vtk import VTKSource

# Local directory (auto-discovers VTK files)
source = VTKSource("./data/")

# Read as volume mesh
source = VTKSource("./volumes/", manifold_dim=3)

# Use cell centroids for CFD polyhedral meshes
source = VTKSource(
    "./cfd/",
    point_source="cell_centroids",
    warn_on_lost_data=False,
)

# Custom glob pattern to select a subset of files
source = VTKSource("./data/", file_pattern="timestep_*")

For remote datasets from HuggingFace Hub, use the dedicated dataset sources:

from physicsnemo_curator.domains.mesh.sources.drivaerml import DrivAerMLSource

source = DrivAerMLSource(mesh_type="boundary")

MeanFilter#

MeanFilter computes the spatial mean of every field in point_data and cell_data for each mesh, accumulates the results in memory, and writes them to a Parquet file on flush().

The mesh is yielded unchanged (pass-through).

Parameters:

Parameter

Type

Default

Description

output

str

required

Parquet file path for statistics

Output columns:

Column

Description

n_points

Number of points in the mesh

n_cells

Number of cells in the mesh

point_data/{field}

Mean of each point data field

cell_data/{field}

Mean of each cell data field

Meshes with different field names are handled gracefully — missing columns are filled with NULL in the Parquet output.

Example:

from physicsnemo_curator.domains.mesh.filters.mean import MeanFilter

filt = MeanFilter(output="stats.parquet")

# Use in a pipeline
pipeline = source.filter(filt).write(sink)
for i in range(len(pipeline)):
    pipeline[i]

# Write accumulated statistics
filt.flush()

# Read results
import pyarrow.parquet as pq
table = pq.read_table("stats.parquet")
print(table.to_pandas())

MeshSink#

MeshSink saves physicsnemo.mesh.Mesh objects using the native tensordict memory-mapped format (Mesh.save()).

Parameters:

Parameter

Type

Default

Description

output_dir

str

required

Directory for output files

naming_template

str | None

None

Format string for output names (see below)

group_readable

bool

False

chmod g+r the written output

Writes are atomic: each mesh is written to a temporary directory in the same parent and then renamed into place, so an interrupted run never leaves a partial/corrupt output directory behind. Mesh objects are saved as .pmsh and DomainMesh objects as .pdmsh.

Output naming: By default each mesh is saved to {output_dir}/mesh_{index:04d}_{seq} where index is the source item index and seq is the sequence number within that item (for sources that yield multiple meshes).

A custom naming_template can be provided using Python format-string syntax with the following placeholders:

Placeholder

Description

{index}

Source item index

{seq}

Sequence number within that item (for multi-mesh sources)

{relpath}

Relative path of the source file (from the source directory)

{stem}

File stem (name without extension) of the source file

Standard format specs are supported (e.g. {index:04d}). The template is used literally — include any file extension you need.

Saved meshes can be loaded back with:

from physicsnemo.mesh import Mesh
mesh = Mesh.load("./output/mesh_0000_0")

Examples:

from physicsnemo_curator.domains.mesh.sinks.mesh_writer import MeshSink

# Default naming
sink = MeshSink(output_dir="./output/")
pipeline = source.filter(filt).write(sink)
paths = pipeline[0]  # ['./output/mesh_0000_0']

# Custom naming for MeshReader compatibility
sink = MeshSink(
    output_dir="./output/",
    naming_template="boundary_{index}.vtp.pmsh",
)
paths = pipeline[0]  # ['./output/boundary_0.vtp.pmsh']

Transform filters#

Several generic, in-pipeline transform filters complement the statistics filters above. All are importable from physicsnemo_curator.domains.mesh.filters and apply to both Mesh and DomainMesh (interior + every boundary).

Filter

Purpose

PrecisionFilter

Convert field precision (e.g. fp64 → fp32 / fp16 / bf16)

CleanFilter

Mesh.clean: merge duplicate points, drop duplicate cells / unused points

PointDataToCellDataFilter

Move point_data onto cells by per-cell averaging

GlobalDataFilter

Inject constant global_data (e.g. U_inf, rho_inf, …)

RandomPermutationFilter

Shuffle point/cell ordering reproducibly

FieldSelectFilter

Keep/drop fields after conversion (post-hoc)

BoundaryInjectionFilter

Synthesize + inject CFD-domain boundaries (see below)

from physicsnemo_curator.domains.mesh.filters import (
    CleanFilter, GlobalDataFilter, PointDataToCellDataFilter,
)

pipeline = (
    source
    .filter(CleanFilter())
    .filter(PointDataToCellDataFilter())  # surface point_data -> cell_data
    .filter(GlobalDataFilter(values={"U_inf": [30.0, 0.0, 0.0], "rho_inf": 1.225}))
    .write(sink)
)

VTISource and GridSidecarSink (structured grids)#

VTK ImageData (.vti) describes a uniform rectilinear grid (origin + spacing + dimensions) and does not fit the unstructured Mesh model. VTISource reads each .vti file into a tensordict.TensorDict of dense N-D field tensors instead:

  • point_data — sub-TensorDict with batch_size = [Nz, Ny, Nx]; scalar fields are (Nz, Ny, Nx), vector fields (Nz, Ny, Nx, C) (VTK x-fastest ordering).

  • cell_data — sub-TensorDict with batch_size = [Cz, Cy, Cx].

  • grid — non-batched metadata: origin, spacing, dimensions, direction.

GridSidecarSink writes the grid as a tensordict memmap sidecar beside the mesh outputs (default {relpath}/{stem}.grid), reloadable with TensorDict.load_memmap:

from physicsnemo_curator.domains.mesh.sources.vti import VTISource
from physicsnemo_curator.domains.mesh.sinks.grid_sidecar import GridSidecarSink

pipeline = VTISource("./grids/").write(GridSidecarSink(output_dir="./out/"))

Boundary-condition injection#

Curated DomainMesh files often carry only the geometry surface (vehicle) plus interior + global data, lacking the CFD-domain outer boundaries (inlet / outlet / walls / symmetry). The physicsnemo_curator.domains.mesh.boundaries subsystem synthesizes those boundaries from the known domain geometry and injects them, preserving interior / vehicle / global_data.

Datasets are specialized purely by choosing a BoundaryGenerator and its constants:

  • BoxTunnelBoundaries — rectangular wind tunnel (inlet/outlet/slip/no_slip); z_floor inferred per sample from the geometry boundary.

  • HemisphereBoundaries — hemispherical open-road domain (inlet/outlet split by freestream direction + a constrained-Delaunay symmetry disk).

Use it either as a standalone filter on any DomainMesh stream (including reading existing .pdmsh), or as the boundary_generator hook on VTKSource domain-mesh mode:

from physicsnemo_curator.domains.mesh.boundaries import HemisphereBoundaries
from physicsnemo_curator.domains.mesh.filters import BoundaryInjectionFilter

gen = HemisphereBoundaries(freestream_key="U_inf")
pipeline = source.filter(BoundaryInjectionFilter(gen, check_watertight=True)).write(sink)

Dependencies#

The mesh domain depends on:

Package

Purpose

physicsnemo

Mesh tensorclass and I/O utilities

pyvista

VTK file reading and mesh manipulation

pyarrow

Parquet I/O for statistics and metadata

torch

Tensor operations (required by physicsnemo)