dask#
Dask execution backend.
Uses dask.bag for parallel and distributed execution.
Supports local execution and can scale to clusters.
Classes#
Execute pipeline items using Dask bags. |
Module Contents#
- class physicsnemo_curator.run.dask.DaskBackend[source]#
Bases:
physicsnemo_curator.run.base.RunBackendExecute pipeline items using Dask bags.
Dask provides parallel execution that can scale from a single machine to a distributed cluster. This backend uses
dask.bagfor task-parallel execution.Warning
Stateful filters accumulate per-worker state that is not merged back. Design a post-hoc merge strategy if needed.
Backend Options#
- schedulerstr
Dask scheduler (“synchronous”, “threads”, “processes”, “distributed”).
- num_workersint | None
Number of workers (for local schedulers).
- clientdistributed.Client | None
Pre-configured Dask distributed client.
- run(
- pipeline: physicsnemo_curator.core.base.Pipeline[Any],
- config: physicsnemo_curator.run.base.RunConfig,
Execute pipeline indices using Dask.