Skip to main content

Quick Reference

All sflow.yaml config fields at a glance. The Required column indicates mandatory fields.

For detailed explanations and examples, see Configuration.

Root-Level

FieldRequiredTypeDefaultDescription
versionYesstringSchema version. Must be "0.1".
variablesdict / listGlobal variables available to expressions and task env.
artifactsdict / listNamed resources referenced by URI.
backendsdict / listCompute backends (local, slurm).
operatorsdict / listTask execution operators (bash, srun, docker, ssh, python).
workflowYesobjectWorkflow definition containing name and tasks.

Variables

YAML path: variables.<name>

FieldRequiredTypeDefaultDescription
valueYesanyVariable value (int, float, bool, string, or list).
descriptionstringnullHuman-readable description.
domainlistnullAllowed values; enables replica variable sweeps. value must be in domain if set.
typestring"string"Type hint (string, integer, etc.).

Artifacts

YAML path: artifacts.<name>

FieldRequiredTypeDefaultDescription
uriYesstringResource URI with scheme (fs://, file://, http://, s3://).
descriptionstringnullHuman-readable description.
contentstringnullInline file content. Only valid with file:// URI.

Backends — Common Fields

YAML path: backends.<name>

FieldRequiredTypeDefaultDescription
typeYesstringlocal or slurm.
defaultboolfalseMark as the default backend (only one allowed).
gpus_per_nodeint / exprnullGPUs per node for allocation / packing.

Backends — Local

Additional fields when type: local

FieldRequiredTypeDefaultDescription
nodesint / expr1Number of synthetic local nodes.

Backends — Slurm

Additional fields when type: slurm

FieldRequiredTypeDefaultDescription
accountYesstring / exprSlurm account.
partitionYesstring / exprSlurm partition.
timeYesstring / exprTime limit (e.g. 00:30:00).
nodesYesint / exprNumber of nodes.
gpus_per_nodeYesint / exprGPUs per node.
extra_argslist[string]nullExtra salloc arguments (e.g. --exclusive).
job_namestringnullJob name; defaults to workflow name.

Operators — Common Fields

YAML path: operators.<name>

FieldRequiredTypeDefaultDescription
typeYesstringOperator type: bash, srun, docker, ssh, or python.

Operators — srun

Additional fields when type: srun

FieldRequiredTypeDefaultDescription
job_idstringnullExisting Slurm job ID.
nodesint / stringnullNode count.
nodelistlist[string][]Node list.
partitionstringnullSlurm partition.
accountstringnullSlurm account.
qosstringnullQOS.
reservationstringnullReservation.
timestringnullTime limit.
constraintstringnullSlurm constraint.
exclusiveboolfalseExclusive node allocation.
chdirstringnullWorking directory.
cpus_per_taskint / stringnullCPUs per task.
gpusstringnullGPU spec (e.g. all, 1, device=0).
gpus_per_taskstringnullGPUs per task.
gresstringnullGeneric resource spec.
memstringnullMemory.
mem_per_cpustringnullMemory per CPU.
ntasksint / stringnullNumber of tasks.
ntasks_per_nodeint / stringnullTasks per node.
exportstring"ALL"Environment export setting.
labelbooltruePrefix output with task label.
unbufferedbooltrueUnbuffered output.
kill_on_bad_exitboolfalseKill job on non-zero task exit.
overlapbooltrueAllow step overlap.
waitint / stringnullWait time.
container_imagestringnullContainer image (Pyxis). Mutually exclusive with container_name.
container_namestringnullExisting container name (Pyxis). Mutually exclusive with container_image.
container_mount_homeboolfalseMount home directory in container.
container_writablebooltrueWritable container filesystem.
container_mountslist[string][]Bind mounts (e.g. "/host:/ctr:rw").
container_workdirstringnullContainer working directory.
container_remap_rootboolfalseRemap root inside container.
mpistringnullMPI type (e.g. pmix, ucx).
extra_argslist[string][]Extra CLI arguments.

Operators — Docker

Additional fields when type: docker

FieldRequiredTypeDefaultDescription
imageYesstringDocker image.
workdirstringnullWorking directory inside container.
mountslist[string][]Bind mounts (e.g. "/host:/ctr:rw").
gpusstringnullGPU spec (e.g. all, device=0).
extra_argslist[string][]Extra docker run arguments.
pass_envsbooltrueForward host environment variables.

Operators — SSH

Additional fields when type: ssh

FieldRequiredTypeDefaultDescription
hostYesstringSSH host.
userstringnullSSH user.
portintnullSSH port.
identity_filestringnullPath to identity file.
extra_argslist[string][]Extra SSH arguments.

Operators — Python

Additional fields when type: python

FieldRequiredTypeDefaultDescription
python_execstring"python"Python executable.
extra_argslist[string][]Extra Python arguments.

Workflow

YAML path: workflow

FieldRequiredTypeDefaultDescription
nameYesstringWorkflow name.
tasksYeslistList of task definitions (must be non-empty).
timeoutstring / intnullWorkflow-level timeout (e.g. 1h, 115m).
variablesdict / listnullWorkflow-scoped variables (same format as root variables).

Tasks

YAML path: workflow.tasks[]

FieldRequiredTypeDefaultDescription
nameYesstringTask name (must be unique).
scriptYeslist[string]Script lines to execute (non-empty).
operatorstring / objectnullOperator name, or inline operator override object.
backendstring / dictnullBackend name, or inline backend override.
depends_onlist[string]nullNames of tasks this task depends on.
timeoutint / stringnullTask-level timeout.
variablesdict / listnullTask-scoped variables.
resourcesobjectnullNode / GPU resource requirements.
replicasobjectnullReplication configuration.
retriesobjectnullRetry configuration.
probesobjectnullReadiness and failure probes.
outputslistnullOutput parsing configuration.

Task Resources

YAML path: workflow.tasks[].resources

FieldRequiredTypeDefaultDescription
nodes.indiceslist[int / expr]nullSpecific node indices (e.g. [0]).
nodes.countint / exprnullNumber of nodes.
gpus.countYesint / exprNumber of GPUs (sets CUDA_VISIBLE_DEVICES).

Task Replicas

YAML path: workflow.tasks[].replicas

FieldRequiredTypeDefaultDescription
countint / exprnullNumber of replicas.
policystring / expr"parallel""parallel" or "sequential".
variableslist[string]nullVariable names for sweeps (Cartesian product of domains).

Task Retries

YAML path: workflow.tasks[].retries

FieldRequiredTypeDefaultDescription
countYesint / exprNumber of retries.
intervalYesint / exprDelay between retries (seconds).
backoffint / expr1Backoff multiplier.

Task Probes (Readiness / Failure)

YAML path: workflow.tasks[].probes.readiness or workflow.tasks[].probes.failure

FieldRequiredTypeDefaultDescription
delayint / expr0Initial delay before probing (seconds).
timeoutint / expr60Max wait time (seconds).
intervalint / expr5Check interval (seconds).
success_thresholdint / expr1Consecutive successes required.
failure_thresholdint / expr3Consecutive failures before failing.

Exactly one probe type must be set per probe:

Probe TypeRequired FieldsOptional FieldsDescription
tcp_portporthost, on_node ("first" / "each")TCP connection check.
http_geturlheadersHTTP GET health check.
http_posturlheaders, bodyHTTP POST health check.
log_watchregex_patternlogger, match_countMatch pattern in task logs.

Task Outputs

YAML path: workflow.tasks[].outputs[]

FieldRequiredTypeDefaultDescription
patternYesstringParse pattern (e.g. "TTFT: {ttft:f} ms").
sourcestring"stdout"Log source: stdout or stderr.
metrics.<key>.descriptionstringnullMetric description.
metrics.<key>.typestringnullMetric type.
metrics.<key>.aggregatestringnullAggregation hint.

Expression Syntax

Fields marked int / expr or string / expr support ${{ ... }} expressions:

ExpressionExample
Variable${{ variables.MY_VAR }}
Backend node IP${{ backends.slurm_cluster.nodes[0].ip_address }}
Artifact path${{ artifacts.model_dir.path }}
Task node IP${{ task.server.nodes[0].ip_address }}

Reserved Environment Variables

Injected by sflow into task environments

These are automatically set by sflow and available in every task script.

VariableDescription
SFLOW_WORKSPACE_DIRAbsolute path to the project workspace root.
SFLOW_OUTPUT_DIRGlobal output root directory (default ./sflow_output).
SFLOW_WORKFLOW_OUTPUT_DIROutput directory for the current workflow run (e.g. sflow_output/<run-id>).
SFLOW_TASK_OUTPUT_DIROutput directory for the current task replica (e.g. sflow_output/<run-id>/my_task_0).
SFLOW_REPLICA_INDEXZero-based replica index (0, 1, 2, ...).
SFLOW_TASK_ASSIGNED_NODE_NAMESComma-separated hostnames of nodes assigned to this task.
SFLOW_TASK_ASSIGNED_NODE_IPSComma-separated IP addresses of nodes assigned to this task.
CUDA_VISIBLE_DEVICESComma-separated GPU indices allocated to this task (set when resources.gpus.count is used).

In addition, all resolved variables and artifacts paths are injected as environment variables accessible via ${VAR_NAME} in scripts.

Read by sflow from the host environment

sflow reads these to detect an existing Slurm allocation and skip salloc.

VariableDescription
SLURM_JOB_ID / SLURM_JOBIDCurrent Slurm job ID. Used to detect an existing allocation.
SLURM_JOB_NODELIST / SLURM_NODELISTNode list for the current Slurm allocation.

Provided by Slurm at runtime

These are set by Slurm (not by sflow) and commonly used in task scripts.

VariableDescription
SLURM_NODEIDNode index within the allocation (useful for NODE_RANK).
SLURMD_NODENAMEHostname of the node running the task.
SLURM_SUBMIT_DIRDirectory from which the job was submitted.