VTK Backend Profiling: PyVista vs Rust#

This example demonstrates the built-in ProfiledPipeline utility and compares the two VTK reading backends available in VTKSource:

PyVista (default): full-featured Python reader supporting all VTK formats, manifold dimensions, and point-source modes.
Rust: native reader built with PyO3 for faster I/O on ASCII VTU/VTP files.

We download DrivAerML boundary files, then run the same pipeline twice — once per backend — and print the per-stage timing breakdown so you can see exactly where time is spent.

Note

The Rust backend requires the native extension to be built (maturin develop). It currently supports ASCII VTU/VTP files only and does not apply manifold_dim or point_source conversion.

Imports#

We use DrivAerMLSource to download DrivAerML boundary files, then VTKSource (which exposes the backend parameter) for local VTK reading, a filter, a sink, and the ProfiledPipeline wrapper.

from physicsnemo_curator.core.profiling import ProfiledPipeline

from physicsnemo_curator.domains.mesh.filters.precision import PrecisionFilter
from physicsnemo_curator.domains.mesh.sinks.mesh_writer import MeshSink
from physicsnemo_curator.domains.mesh.sources.vtk import VTKSource
from physicsnemo_curator.run import run_pipeline

Download DrivAerML Files#

We use fsspec to download a small subset of DrivAerML boundary VTP files from HuggingFace Hub to a local cache directory. Subsequent runs read from cache.

N_RUNS = 3
N_JOBS = 1  # Sequential for fair comparison (no scheduling noise)

DRIVAERML_URL = "hf://datasets/neashton/drivaerml"
CACHE_DIR = "outputs/profiling/drivaerml_cache"

import pathlib

import fsspec

fs, root_path = fsspec.core.url_to_fs(DRIVAERML_URL)
glob_expr = f"{root_path}/**/boundary*.vtp"
all_files = fs.glob(glob_expr)
files = sorted(f for f in all_files if f.endswith(".vtp") and not fs.isdir(f))

# Force download of the files we need
for remote_path in files[:N_RUNS]:
    local_path = pathlib.Path(CACHE_DIR) / remote_path.lstrip("/")
    if not local_path.exists():
        local_path.parent.mkdir(parents=True, exist_ok=True)
        fs.get(remote_path, str(local_path))

print(f"VTP files cached: {len(files)}")

PyVista Backend#

The default backend reads VTK files through PyVista and converts to Mesh via from_pyvista().

pyvista_source = VTKSource(CACHE_DIR, backend="pyvista")

print(f"VTK files discovered locally: {len(pyvista_source)}")

pyvista_pipeline = pyvista_source.filter(PrecisionFilter(target_dtype="float32")).write(
    MeshSink(output_dir="outputs/profiling/pyvista_meshes/")
)

Wrap with ProfiledPipeline#

ProfiledPipeline is a transparent proxy — it passes through to the real pipeline while recording wall-clock time, memory usage, and optional GPU metrics for every stage.

profiled_pyvista = ProfiledPipeline(pyvista_pipeline)

results_pyvista = run_pipeline(
    profiled_pyvista,
    n_jobs=N_JOBS,
    backend="sequential",
    indices=range(N_RUNS),
)

print("=== PyVista Backend ===")
profiled_pyvista.metrics.to_console()

Rust Backend#

The Rust backend parses VTK XML directly using quick-xml and returns raw NumPy arrays. It skips the PyVista/VTK library stack entirely, trading feature completeness for speed.

We point to the same cached files, so the comparison measures only parse time — not download time.

Note

The Rust backend only supports ASCII VTU/VTP files and does not apply manifold_dim or point_source conversion.

rust_source = VTKSource(CACHE_DIR, backend="rust")

rust_pipeline = rust_source.filter(PrecisionFilter(target_dtype="float32")).write(
    MeshSink(output_dir="outputs/profiling/rust_meshes/")
)

profiled_rust = ProfiledPipeline(rust_pipeline)

results_rust = run_pipeline(
    profiled_rust,
    n_jobs=N_JOBS,
    backend="sequential",
    indices=range(N_RUNS),
)

print("=== Rust Backend ===")
profiled_rust.metrics.to_console()

Compare Results#

Print a side-by-side summary of the two backends. The table shows total wall-clock time and mean per-index time for each backend.

pyvista_metrics = profiled_pyvista.metrics
rust_metrics = profiled_rust.metrics

pyvista_total_ms = pyvista_metrics.total_wall_time_ns / 1e6
rust_total_ms = rust_metrics.total_wall_time_ns / 1e6

print("\n=== Comparison ===\n")
print(f"  {'Backend':<12s} {'Total (ms)':>12s} {'Mean/index (ms)':>17s}")
print("  " + "-" * 43)
print(f"  {'PyVista':<12s} {pyvista_total_ms:>12,.2f} {pyvista_metrics.mean_index_time_ns / 1e6:>17,.2f}")
print(f"  {'Rust':<12s} {rust_total_ms:>12,.2f} {rust_metrics.mean_index_time_ns / 1e6:>17,.2f}")

if rust_total_ms > 0:
    speedup = pyvista_total_ms / rust_total_ms
    print(f"\n  Rust speedup: {speedup:.1f}x")

Export Metrics#

PipelineMetrics supports three export formats for further analysis:

JSON: full per-index, per-stage breakdown.
CSV: tabular format for spreadsheets or plotting.
Console: human-readable summary (shown above).

pyvista_metrics.to_json("outputs/profiling/pyvista_profile.json")
pyvista_metrics.to_csv("outputs/profiling/pyvista_profile.csv")
rust_metrics.to_json("outputs/profiling/rust_profile.json")
rust_metrics.to_csv("outputs/profiling/rust_profile.csv")

print("\nMetrics exported to outputs/profiling/")

Cleanup#

Remove the temporary metric directories created by each ProfiledPipeline instance.

profiled_pyvista.cleanup()
profiled_rust.cleanup()

Note

Typical results on an x86-64 workstation show the Rust backend reads VTK files 2–5x faster than PyVista for ASCII VTU/VTP files. The speedup comes from direct XML parsing (quick-xml) and zero-copy NumPy conversion, bypassing the VTK C++ library and PyVista wrapper layers.

Actual results depend on file size, disk speed, and CPU. Use this example as a starting point for profiling your own pipelines.

Gallery generated by Sphinx-Gallery