Writing Custom Sources#
A source reads data from local or remote storage and yields domain objects (e.g. meshes, xarray datasets). Each source handles its own file discovery and caching internally. This page walks through the interface contract, implementation patterns, and a worked example.
Interface Contract#
Subclass Source and implement three
methods:
Method |
Signature |
Purpose |
|---|---|---|
|
|
Number of items the source contains |
|
|
Yield one or more domain objects for the given index |
|
|
Declare constructor parameters for CLI discovery |
Key rules:
Each source handles its own file discovery and caching internally (using
pathlib,fsspec, or other appropriate libraries).__getitem__is a generator (usesyield). It can yield multiple items per index if a single file contains several samples.params()drives the CLI prompts and documents the constructor interface.
Minimal Example#
from __future__ import annotations
import pathlib
from typing import ClassVar, TYPE_CHECKING
from physicsnemo_curator.core.base import Source, Param
if TYPE_CHECKING:
from collections.abc import Generator
from physicsnemo.mesh import Mesh
class MySource(Source["Mesh"]):
name: ClassVar[str] = "My Reader"
description: ClassVar[str] = "Reads data from my custom format"
@classmethod
def params(cls) -> list[Param]:
return [
Param(name="input_path", description="Path to data directory", type=str),
Param(name="option", description="Processing option", type=str, default="default"),
]
def __init__(self, input_path: str, option: str = "default") -> None:
self._option = option
root = pathlib.Path(input_path)
self._files = sorted(root.glob("**/*.vtk"))
def __len__(self) -> int:
return len(self._files)
def __getitem__(self, index: int) -> Generator[Mesh]:
path = str(self._files[index])
mesh = self._load(path)
yield mesh
def _load(self, path: str) -> Mesh:
"""Load a single item from a local file path."""
...
Implementation Patterns#
Single-item sources#
The most common pattern — each file maps to exactly one domain object:
def __getitem__(self, index: int) -> Generator[Mesh]:
path = str(self._files[index])
mesh = read_vtk(path)
yield mesh
Multi-item sources#
When a single file contains multiple samples (e.g. time steps in an HDF5 file):
def __getitem__(self, index: int) -> Generator[Mesh]:
path = str(self._files[index])
with h5py.File(path) as f:
for timestep in f["timesteps"]:
yield self._build_mesh(timestep)
Eager metadata, lazy data#
Load lightweight metadata up-front in __init__ and defer heavy data loading
to __getitem__:
def __init__(self, input_path: str) -> None:
root = pathlib.Path(input_path)
self._files = sorted(root.glob("**/*.vtk"))
# Lightweight — just read headers
self._metadata = [read_header(str(f)) for f in self._files]
def __getitem__(self, index: int) -> Generator[Mesh]:
path = str(self._files[index])
meta = self._metadata[index]
mesh = read_full(path, meta)
yield mesh
Registration#
Register your source in the submodule’s __init__.py so the CLI can discover it:
from physicsnemo_curator.core.registry import registry
from .sources.my_source import MySource
registry.register_source("mymodule", MySource)
Gallery Example#
For a complete worked example, see Creating a Custom Source.