process_pool#

Process pool execution backend.

Uses concurrent.futures.ProcessPoolExecutor for parallel execution. Suitable for CPU-bound workloads that benefit from true parallelism.

Classes#

ProcessPoolBackend

Execute pipeline items using a process pool.

Module Contents#

class physicsnemo_curator.run.process_pool.ProcessPoolBackend[source]#

Bases: physicsnemo_curator.run.base.RunBackend

Execute pipeline items using a process pool.

This backend uses Python’s concurrent.futures.ProcessPoolExecutor. It provides true parallelism for CPU-bound workloads by spawning separate processes that bypass the GIL.

Warning

Stateful filters accumulate per-process state that is not merged back into the parent process. Use sequential execution when filter side-effects must be aggregated.

Backend Options#

max_workersint | None

Maximum number of processes. Defaults to config.resolved_n_jobs.

mp_contextstr | None

Multiprocessing context (“spawn”, “fork”, “forkserver”).

initializerCallable | None

Callable to run at the start of each worker process.

initargstuple

Arguments for the initializer.

run(
pipeline: physicsnemo_curator.core.base.Pipeline[Any],
config: physicsnemo_curator.run.base.RunConfig,
) list[list[str]][source]#

Execute pipeline indices using a process pool.

Parameters:
  • pipeline (Pipeline) – The pipeline to execute.

  • config (RunConfig) – Execution configuration.

Returns:

Sink outputs, one list per index.

Return type:

list[list[str]]

description: ClassVar[str] = 'Process pool executor (true parallelism for CPU-bound tasks)'#
name: ClassVar[str] = 'process_pool'#
requires: ClassVar[tuple[str, Ellipsis]] = ()#