Pipeline Wizard#

PhysicsNeMo Curator includes an interactive command-line wizard that guides you through building and executing a pipeline without writing code.

Installation#

pip install physicsnemo-curator[wiz]

This installs the required dependencies:

  • click — CLI framework

  • questionary — interactive prompts

  • rich — colored output and progress bars

You also need the domain submodule installed (e.g. pip install physicsnemo-curator[mesh]).

Usage#

psnc

The wizard displays a styled welcome banner and walks through the pipeline configuration with colored output and progress indicators. You can either build a new pipeline interactively or load a previously saved one from YAML or JSON.

1. Select Submodule#

The CLI discovers registered submodules and shows which have their dependencies installed:

╭─────────────────────────────────────╮
│   PhysicsNeMo Curator               │
│   Interactive ETL Pipeline Wizard   │
╰─────────────────────────────────────╯

Step 1/5: Select Submodule
? Select a submodule:
  ▸ mesh — Mesh data curation (physicsnemo.mesh.Mesh)
    da — DataArray data curation (xarray.DataArray) (not installed)
    atm — Atomic data curation (nvalchemi.data.AtomicData) (not installed)

2. Select Source#

Choose from the registered sources for the selected submodule:

Step 2/5: Select Source/Reader
? Select a source/reader:
  ▸ VTK Reader — Read VTK files (.vtk, .vtp, .vtu, .vts, .vtm)

You are then prompted for source-specific parameters (data location, conversion options):

  Configure VTK Reader:
  ? input_path (Path to file or directory): ./cfd_results/
  ? manifold_dim (Target manifold dimension) [auto]: auto
  ? point_source (Point source mode) [vertices]: vertices
  ? warn_on_lost_data (Warn when data arrays are discarded) [True]:
  ✓ Found 42 item(s) in source

3. Select Filters#

Choose zero or more filters (multi-select with checkboxes):

Step 3/5: Select Filters
? Select filters (space to toggle, enter to confirm):
  ▸ ☑ Mean Statistics — Compute spatial means and save to Parquet
  ✓ Selected 1 filter(s)

Each selected filter’s parameters are prompted in order.

4. Select Sink#

Choose the output writer:

Step 4/5: Select Sink/Writer
? Select a sink:
  ▸ PhysicsNeMo Mesh Writer — Save in native tensordict format
  ✓ Configured sink: PhysicsNeMo Mesh Writer

5. Execute#

The CLI builds the pipeline, displays a summary, and processes all items with an animated progress bar. Stateful filters are flushed automatically after execution.

Step 5/5: Execute Pipeline

╭──────────────────── Pipeline ────────────────────╮
│ VTK Reader → Mean Statistics → Mesh Writer       │
╰──────────────────────────────────────────────────╯

⠋ Processing... ━━━━━━━━━━━━━━━━━━━━ 100% 42/42 ./output/mesh_0041_0

• Statistics saved to stats.parquet

╭─────────────── ✓ Complete ───────────────╮
│ Source items processed:        42        │
│ Outputs written:               42        │
│ Database path:     ~/.cache/psnc/a1b2.db │
╰──────────────────────────────────────────╯

Cache Management#

Pipeline databases are stored in ~/.cache/psnc/ by default (see Checkpointing Pipelines for how to change the location). The psnc cache command group provides tools for inspecting and managing these databases.

Show cache directory#

psnc cache path
# ~/.cache/psnc

List databases#

psnc cache list

Displays a table of all databases with their hash prefix, creation time, pipeline components, progress, and file size.

Inspect a database#

psnc cache info a1b2

Shows detailed metadata for a single database identified by hash prefix.

Remove databases#

# Remove by hash prefix
psnc cache rm a1b2

# Remove databases older than 7 days
psnc cache rm --older-than 7d

# Remove all databases (with confirmation)
psnc cache rm --all

# Skip confirmation prompt
psnc cache rm --all --yes

Duration format#

The --older-than flag accepts human-readable durations:

Suffix

Meaning

s

seconds

m

minutes

h

hours

d

days

w

weeks

Examples: 30m, 12h, 7d, 2w.

Color Scheme#

The CLI uses a consistent color scheme throughout:

Element

Color

Branding

NVIDIA green

Step headers

Blue

Highlights

Cyan

Success (✓)

Green

Warnings (⚠)

Yellow

Errors (✗)

Red

Programmatic Equivalent#

Everything the CLI does can be done in Python:

from physicsnemo_curator import run_pipeline
from physicsnemo_curator.domains.mesh.sources.vtk import VTKSource
from physicsnemo_curator.domains.mesh.filters.mean import MeanFilter
from physicsnemo_curator.domains.mesh.sinks.mesh_writer import MeshSink

pipeline = (
    VTKSource("./cfd_results/")
    .filter(MeanFilter(output="stats.parquet"))
    .write(MeshSink(output_dir="./output/"))
)

# Sequential with progress bar (equivalent to CLI behaviour)
results = run_pipeline(pipeline)

# Or parallel across multiple cores
results = run_pipeline(pipeline, n_jobs=-1, backend="process_pool")

# Flush stateful filters (sequential only)
pipeline.filters[0].flush()

See Parallel Execution for details on run_pipeline and available backends.