nv_dfm_core.targets.local.FederationRunner#
- class nv_dfm_core.targets.local.FederationRunner(max_concurrent_jobs, sites, federation_name, homesite, logger=None)[source]#
Manages a pool of N JobRunner instances, each of which has site-many processes. Comes with health monitoring and recovery.
- Parameters:
max_concurrent_jobs (int)
sites (list[str])
federation_name (str)
homesite (str)
logger (Logger | None)
- attach_client(job_id)[source]#
Attach a consumer for buffered/live tokens. + + Returns (consumer_id, queue) if job_id is known, else None. +
- Parameters:
job_id (str)
- Return type:
tuple[str, Queue[TokenPackage]] | None
- find_job_handle(job_id)[source]#
Find job execution info by ID.
Returns JobHandle (not LocalJob) to avoid keeping client objects alive.
- Returns:
JobHandle if found in running jobs, None otherwise.
- Parameters:
job_id (str)
Note
Only returns jobs currently running. Finished jobs are not tracked.
- get_job_handle_any(job_id)[source]#
Return the JobHandle for a known job_id, even if it is no longer running.
- Parameters:
job_id (str)
- Return type:
object | None
- release_execution(execution, *, abort=True)[source]#
Return the execution to the free pool
- Parameters:
execution (object)
abort (bool)
- submit(pipeline, next_frame, netirs, force_modgen=False, callback_runner=None)[source]#
Executes a job by finding a free JobExecution object (which has one JobRunner process per site).
- Parameters:
pipeline (PreparedPipeline) – The prepared pipeline to execute.
next_frame (Frame) – The next frame for the job.
netirs (dict[str, BoundNetIR]) – Dictionary of bound net IRs for each site.
force_modgen (bool) – Whether to force module regeneration.
callback_runner (CallbackRunner | None) – The callback runner for dispatching results.
- Returns:
The created and started job.
- Return type: