loky#
Loky (joblib) execution backend.
Uses joblib.Parallel with the loky backend for robust parallel execution.
Loky provides better process management than the standard multiprocessing module,
including automatic worker restart on crashes and better memory cleanup.
Classes#
Execute pipeline items using joblib with the loky backend. |
Module Contents#
- class physicsnemo_curator.run.loky.LokyBackend[source]#
Bases:
physicsnemo_curator.run.base.RunBackendExecute pipeline items using joblib with the loky backend.
Loky is a robust process executor that handles worker crashes gracefully and provides better memory management than standard multiprocessing. Key advantages over ProcessPoolExecutor:
Automatic worker restart: If a worker crashes or is killed, loky automatically restarts it without failing the entire job.
Memory cleanup: Workers are recycled after processing a configurable number of tasks to prevent memory leaks.
Robust serialization: Uses cloudpickle for better serialization of complex objects including lambdas and closures.
Timeout handling: Built-in timeout support per task.
Warning
Stateful filters accumulate per-process state that is not merged back into the parent process.
Backend Options#
- preferstr
Soft hint for parallelization (“processes” or “threads”).
- requirestr
Hard constraint for parallelization.
- verboseint
Verbosity level (0-50). If not set, uses 0 (quiet).
- batch_sizeint | str
Number of tasks per batch (“auto” or int).
- pre_dispatchstr | int
Number of batches to pre-dispatch.
- temp_folderstr | None
Folder for memmapping large arrays.
- timeoutfloat | None
Timeout in seconds for retrieving results.
- max_nbytesstr | int | None
Threshold for automatic memmapping (e.g., “1M”).
- run(
- pipeline: physicsnemo_curator.core.base.Pipeline[Any],
- config: physicsnemo_curator.run.base.RunConfig,
Execute pipeline indices using joblib/loky.