Metrics Dashboard#

PhysicsNeMo Curator includes an interactive web dashboard for inspecting pipeline run metrics. It reads the SQLite database produced by the pipeline’s profiling system and presents timing, memory, stage-level breakdowns, and filter artifact previews in a browser-based interface.

Installation#

pip install physicsnemo-curator[dashboard]

This installs the required dependencies:

  • panel — reactive web application framework

  • holoviews — declarative data visualization

  • bokeh — interactive plotting backend

  • pandas — DataFrame manipulation

  • pyarrow — Parquet file reading

Launch#

From the command line#

psnc dashboard pipeline.db

You can also pass a serialized pipeline file (.yaml or .json) — the dashboard computes the config hash and locates the matching database automatically:

psnc dashboard my_pipeline.yaml

Or pass a hash prefix instead of a full path. The dashboard will look up the matching database in the cache directory:

psnc dashboard a1b2

Options:

Flag

Default

Description

--port

5006

Server port

--no-browser

off

Don’t open a browser window on launch

From Python#

from physicsnemo_curator.dashboard import launch

launch("pipeline.db", port=5006)

Or for more control:

from physicsnemo_curator.dashboard import DashboardApp

app = DashboardApp("pipeline.db")
tabs = app.servable()  # for embedding in a notebook

Prerequisites#

The dashboard reads the SQLite database that the pipeline creates when track_metrics=True (the default). Make sure your pipeline was configured to collect metrics:

from physicsnemo_curator.core.base import Pipeline

pipeline = Pipeline(
    source=source,
    filters=[...],
    sink=sink,
    track_metrics=True,   # default
    track_memory=True,    # default
    db_dir="./runs/",     # directory for the .db file
)

After run_pipeline() completes, the database is at <db_dir>/<config_hash>.db.

Tabs#

Overview#

Summary of the pipeline run:

  • Progress cards — completed, failed, remaining counts with elapsed time

  • Workers — table of registered workers with heartbeat status (useful when monitoring a running pipeline)

  • Pipeline structure — source → filters → sink chain

  • Recent output files — last 20 files produced by the sink

  • Error log — indices that failed with error messages

Pipeline#

Inspect the pipeline structure and drill into individual indices:

  • Structure flow — visual cards for each pipeline component with parameters

  • Index query — filter by index range (e.g. 10-20, 1,5,10) or status (completed, error)

  • Artifact inspection — click an index to see its output files and filter artifacts. If a WidgetProvider is registered for a filter, a rich visualization is shown inline (e.g. a bar chart for MeanFilter Parquet files)

  • Aggregate view — when no index is selected, browse all artifacts grouped by filter name with Parquet previews

Performance#

Timing and resource analysis:

  • Timeline scatter — wall time per index, colored by status. Click a point to select it in the Pipeline tab. Toggle a memory overlay

  • Stage breakdown — stacked bar chart of per-stage time for each index. Filter by stage name. Summary statistics table (mean, median, p95, max)

  • Resource summary — memory distribution histogram, GPU memory histogram (if tracked), and a table of the 10 slowest indices

Widget Extension System#

Filters that produce artifacts (Parquet, Zarr, etc.) can have custom visualizations in the Pipeline tab. A built-in widget is provided for MeanFilter.

Writing a custom widget#

Implement the WidgetProvider protocol:

import panel as pn


class MyFilterWidget:
    """Widget for visualizing MyFilter artifacts."""

    name = "My Filter Stats"
    filter_name = "MyFilter"  # must match the filter class name

    def panel(
        self,
        artifact_paths: list[str],
        selected_index: int | None = None,
    ) -> pn.viewable.Viewable:
        # Read artifacts, build visualization
        ...
        return pn.Column(...)

Registering a widget#

Register at runtime before launching the dashboard:

from physicsnemo_curator.dashboard import DashboardApp

app = DashboardApp("pipeline.db")
app.widget_registry.register(MyFilterWidget())
app.serve()

Or add auto-discovery to physicsnemo_curator/dashboard/widgets/__init__.py following the pattern of the built-in MeanFilterWidget.

Live Monitoring#

The dashboard can be launched while a pipeline is still running. The Overview tab shows worker heartbeats and progress updates. Use the refresh mechanism to poll the database for new results:

from physicsnemo_curator.dashboard import DashboardApp

app = DashboardApp("pipeline.db")
# The store auto-refreshes on parameter events
app.serve()

The PipelineStore uses WAL-mode SQLite, so concurrent reads from the dashboard and writes from the pipeline are safe.

Programmatic Access#

If you need the data without the web UI, use PipelineStore directly:

from physicsnemo_curator.core.pipeline_store import PipelineStore

store = PipelineStore.from_db("pipeline.db")

# Summary
print(store.summary(total=100))

# Per-index metrics
metrics = store.metrics()
for im in metrics.indices:
    print(f"Index {im.index}: {im.wall_time_ns / 1e9:.2f}s")

# Artifacts
artifacts = store.all_filter_artifacts()
for filter_name, paths in artifacts.items():
    print(f"{filter_name}: {len(paths)} files")

See Profiling Pipelines for details on the metrics system.