nv_dfm_core.targets.local.FederationRunner#

class nv_dfm_core.targets.local.FederationRunner(max_concurrent_jobs, sites, federation_name, homesite, logger=None)[source]#

Manages a pool of N JobRunner instances, each of which has site-many processes. Comes with health monitoring and recovery.

Parameters:
  • max_concurrent_jobs (int)

  • sites (list[str])

  • federation_name (str)

  • homesite (str)

  • logger (Logger | None)

attach_client(job_id)[source]#

Attach a consumer for buffered/live tokens. + + Returns (consumer_id, queue) if job_id is known, else None. +

Parameters:

job_id (str)

Return type:

tuple[str, Queue[TokenPackage]] | None

find_job_handle(job_id)[source]#

Find job execution info by ID.

Returns JobHandle (not LocalJob) to avoid keeping client objects alive.

Returns:

JobHandle if found in running jobs, None otherwise.

Parameters:

job_id (str)

Note

Only returns jobs currently running. Finished jobs are not tracked.

get_job_handle_any(job_id)[source]#

Return the JobHandle for a known job_id, even if it is no longer running.

Parameters:

job_id (str)

Return type:

object | None

release_execution(execution, *, abort=True)[source]#

Return the execution to the free pool

Parameters:
  • execution (object)

  • abort (bool)

shutdown()[source]#

Shuts down all worker processes. Gracefully, hopefully.

submit(pipeline, next_frame, netirs, force_modgen=False, callback_runner=None)[source]#

Executes a job by finding a free JobExecution object (which has one JobRunner process per site).

Parameters:
  • pipeline (PreparedPipeline) – The prepared pipeline to execute.

  • next_frame (Frame) – The next frame for the job.

  • netirs (dict[str, BoundNetIR]) – Dictionary of bound net IRs for each site.

  • force_modgen (bool) – Whether to force module regeneration.

  • callback_runner (CallbackRunner | None) – The callback runner for dispatching results.

Returns:

The created and started job.

Return type:

LocalJob