Timeouts#

There are two types of timeouts a workflow can have. You can view the default timeout values in the UI pool information.

Field

Description

exec_timeout

Maximum execution time for each group in the workflow. The clock starts when a group’s status transitions to RUNNING and applies independently per group, so a long-running group does not affect the budget of other groups in the same workflow.

queue_timeout

Maximum queue time for each group, measured from when the group enters SCHEDULING (submitted to the backend k8s queue) until it is assigned a node and enters INITIALIZING. A group still in SCHEDULING past this window is marked FAILED_QUEUE_TIMEOUT. Image pull and preflight time in INITIALIZING is governed separately by the start timeout, not this one. Each group has its own clock.

Note

The default timeout values can be configured but requires service-level configuration. If you have administrative access, you can enable this directly. Otherwise, contact someone with workflow administration privileges.

For example:

workflow:
  name: my_workflow
  timeout:
    exec_timeout: 8h
    queue_timeout: 6h
  ...

If a running group exceeds exec_timeout, that group is marked FAILED_EXEC_TIMEOUT and its downstream groups cascade to FAILED_UPSTREAM. Sibling groups that are still within their own exec_timeout window continue running. The workflow status aggregates to FAILED_EXEC_TIMEOUT once all groups have finished and at least one timed out.

If a group stays in SCHEDULING (waiting for a node assignment) longer than queue_timeout, that group is marked FAILED_QUEUE_TIMEOUT. The workflow status aggregates to FAILED_QUEUE_TIMEOUT once all groups have finished and at least one hit the queue timeout.

The timeout values are defined in the format <integer><unit>. The units supported are:

  • s (seconds)

  • m (minutes)

  • h (hours)

  • d (days)

Note

The timeout value does NOT support a mix and match of units, like 10h5m.