dask#

Dask execution backend.

Uses dask.bag for parallel and distributed execution. Supports local execution and can scale to clusters.

Classes#

DaskBackend

Execute pipeline items using Dask bags.

Module Contents#

class physicsnemo_curator.run.dask.DaskBackend[source]#

Bases: physicsnemo_curator.run.base.RunBackend

Execute pipeline items using Dask bags.

Dask provides parallel execution that can scale from a single machine to a distributed cluster. This backend uses dask.bag for task-parallel execution.

Warning

Stateful filters accumulate per-worker state that is not merged back. Design a post-hoc merge strategy if needed.

Backend Options#

schedulerstr

Dask scheduler (“synchronous”, “threads”, “processes”, “distributed”).

num_workersint | None

Number of workers (for local schedulers).

clientdistributed.Client | None

Pre-configured Dask distributed client.

run(
pipeline: physicsnemo_curator.core.base.Pipeline[Any],
config: physicsnemo_curator.run.base.RunConfig,
) list[list[str]][source]#

Execute pipeline indices using Dask.

Parameters:
  • pipeline (Pipeline) – The pipeline to execute.

  • config (RunConfig) – Execution configuration.

Returns:

Sink outputs, one list per index.

Return type:

list[list[str]]

Raises:

ImportError – If dask is not installed.

description: ClassVar[str] = 'Dask bags for parallel/distributed execution'#
name: ClassVar[str] = 'dask'#
requires: ClassVar[tuple[str, Ellipsis]] = ('dask',)#