data#

Data layer wrapping PipelineStore for the dashboard.

DashboardStore is a param.Parameterized adapter that queries the SQLite database and exposes results as pandas DataFrames suitable for Panel reactive updates.

Classes#

DashboardStore

Reactive wrapper around PipelineStore.

Module Contents#

class physicsnemo_curator.dashboard.data.DashboardStore(db_path: str, **kwargs: Any)#

Bases: param.Parameterized

Reactive wrapper around PipelineStore.

Provides pandas DataFrame views of pipeline metrics. Triggers a cache invalidation when the refresh event fires, causing the next property access to re-query the database.

Initialize the dashboard store.

Parameters:

db_path (str) – Path to an existing PipelineStore SQLite database.
**kwargs (Any) – Additional param keyword arguments.

all_artifacts() → dict[str, list[str]]#

Return all filter artifacts across all indices, resolved to absolute paths.

Returns:: Mapping of filter name to list of all resolved artifact paths.
Return type:: dict[str, list[str]]

artifacts(index: int) → dict[str, list[str]]#

Return filter artifacts for a given index, resolved to absolute paths.

Parameters:: index (int) – Pipeline source index.
Returns:: Mapping of filter name to list of resolved artifact paths.
Return type:: dict[str, list[str]]

log_worker_ids() → list[str]#

Return unique worker IDs from logs.

Returns:: Sorted list of unique worker IDs (including “Main”).
Return type:: list[str]

logs_df(limit: int = 500, min_level: int = 0) → pandas.DataFrame#

DataFrame of log entries from the pipeline run.

Parameters:

limit (int) – Maximum number of log entries to retrieve (default: 500).
min_level (int) – Minimum log level (0=DEBUG, 10=DEBUG, 20=INFO, 30=WARNING, 40=ERROR).

Returns:

Log entries with columns: timestamp, level_name, worker_id, idx, message.

Return type:

pd.DataFrame

output_paths(index: int) → list[str]#

Return output file paths for a given index.

Parameters:: index (int) – Pipeline source index.
Returns:: Ordered list of output file paths.
Return type:: list[str]

worker_indices(worker_id: str) → dict[str, list[int]]#

Return indices processed by a specific worker.

Parameters:: worker_id (str) – The worker ID to query.
Returns:: Dictionary with keys ‘completed’ and ‘failed’, each containing a sorted list of indices processed by this worker.
Return type:: dict[str, list[int]]

property index_df: pandas.DataFrame#

DataFrame of per-index results.

Columns: index, status, wall_time_s, peak_memory_mb, gpu_memory_mb, error.

Returns:: One row per processed index.
Return type:: pd.DataFrame

property pipeline_config: dict#

Return the pipeline configuration dictionary.

Returns:: Pipeline configuration as stored in the database.
Return type:: dict

refresh#

selected_index#

property stage_df: pandas.DataFrame#

DataFrame of per-stage timing for all indices.

Columns: index, stage_name, stage_order, wall_time_s.

Returns:: One row per (index, stage) combination.
Return type:: pd.DataFrame

property summary: dict[str, Any]#

Summary of the pipeline run state.

Returns:: Keys: total, completed, failed, remaining, elapsed_s, config_hash, db_path, workers.
Return type:: dict[str, Any]

property workers_df: pandas.DataFrame#

DataFrame of registered workers.

Columns: worker_id, pid, hostname, started_at, last_heartbeat, current_index, completed, failed.

Returns:: One row per worker.
Return type:: pd.DataFrame