cache#

Cache directory management and introspection for pipeline SQLite databases.

Provides utilities to locate, list, inspect, and clean up .db files produced by pipeline runs. The default cache location follows the XDG Base Directory Specification and can be overridden with the PSNC_CACHE_DIR environment variable.

Usage#

>>> from physicsnemo_curator.core.cache import default_cache_dir, list_databases
>>> cache = default_cache_dir()
>>> for info in list_databases(cache):
...     print(info.hash_prefix, info.source_name, info.completed)

Attributes#

Classes#

DBInfo

Metadata about a single pipeline database file.

Functions#

cache_size(→ int)

Return the total size in bytes of all .db files in the cache.

clear_cache(→ int)

Remove all .db files from the cache directory.

default_cache_dir(→ pathlib.Path)

Return the default cache directory for pipeline databases.

list_databases(→ list[DBInfo])

List all pipeline databases in the cache directory.

remove_databases(→ int)

Remove pipeline databases matching the given hash prefixes.

remove_older_than(→ int)

Remove pipeline databases older than max_age (by file mtime).

Module Contents#

class physicsnemo_curator.core.cache.DBInfo[source]#

Metadata about a single pipeline database file.

Parameters:
  • hash_prefix (str) – Filename stem (the config hash prefix used as the DB name).

  • path (pathlib.Path) – Absolute path to the .db file.

  • size_bytes (int) – File size in bytes.

  • created (datetime) – Pipeline run start timestamp (from pipeline_runs.started_at).

  • source_name (str) – Registered source name extracted from the stored config JSON.

  • sink_name (str) – Registered sink name extracted from the stored config JSON.

  • filter_names (list[str]) – Registered filter names extracted from the stored config JSON.

  • total (int) – Total number of index_results rows (completed + failed).

  • completed (int) – Number of completed index results.

  • failed (int) – Number of failed index results.

completed: int = 0#
created: datetime.datetime#
failed: int = 0#
filter_names: list[str] = []#
hash_prefix: str#
path: pathlib.Path#
sink_name: str#
size_bytes: int#
source_name: str#
total: int = 0#
physicsnemo_curator.core.cache.cache_size(*, cache_dir: pathlib.Path | None = None) int[source]#

Return the total size in bytes of all .db files in the cache.

Parameters:

cache_dir (pathlib.Path | None, optional) – Directory to measure. Defaults to default_cache_dir().

Returns:

Total bytes occupied by .db files, or 0 if the directory is empty or does not exist.

Return type:

int

physicsnemo_curator.core.cache.clear_cache(*, cache_dir: pathlib.Path | None = None) int[source]#

Remove all .db files from the cache directory.

Parameters:

cache_dir (pathlib.Path | None, optional) – Directory to clear. Defaults to default_cache_dir().

Returns:

Number of database files removed.

Return type:

int

physicsnemo_curator.core.cache.default_cache_dir() pathlib.Path[source]#

Return the default cache directory for pipeline databases.

Resolution order (highest priority first):

  1. PSNC_CACHE_DIR environment variable

  2. $XDG_CACHE_HOME/psnc/

  3. ~/.cache/psnc/

Returns:

Absolute path to the cache directory (may not exist yet).

Return type:

pathlib.Path

Examples

>>> import os
>>> os.environ["PSNC_CACHE_DIR"] = "/tmp/my_cache"
>>> default_cache_dir()
PosixPath('/tmp/my_cache')
physicsnemo_curator.core.cache.list_databases(cache_dir: pathlib.Path | None = None) list[DBInfo][source]#

List all pipeline databases in the cache directory.

Opens each .db file, reads the pipeline_runs and index_results tables, and returns metadata sorted newest first (by started_at timestamp). Corrupt or unreadable databases are silently skipped.

Parameters:

cache_dir (pathlib.Path | None, optional) – Directory to scan. Defaults to default_cache_dir().

Returns:

Metadata for each valid database, sorted newest first.

Return type:

list[DBInfo]

physicsnemo_curator.core.cache.remove_databases(
hash_prefixes: list[str],
*,
cache_dir: pathlib.Path | None = None,
) int[source]#

Remove pipeline databases matching the given hash prefixes.

Each prefix is matched against .db filenames (stems). A prefix that matches more than one file raises ValueError to prevent accidental deletion.

Parameters:
Returns:

Number of database files removed.

Return type:

int

Raises:

ValueError – If a prefix is ambiguous (matches more than one .db file).

physicsnemo_curator.core.cache.remove_older_than(
max_age: datetime.timedelta,
*,
cache_dir: pathlib.Path | None = None,
) int[source]#

Remove pipeline databases older than max_age (by file mtime).

Parameters:
  • max_age (timedelta) – Maximum age. Files with an mtime older than now - max_age are removed.

  • cache_dir (pathlib.Path | None, optional) – Directory to scan. Defaults to default_cache_dir().

Returns:

Number of database files removed.

Return type:

int

physicsnemo_curator.core.cache.logger#