Variables
variables are named values you can reference in two places:
- YAML expressions:
${{ variables.NAME }} - Task scripts (as env vars):
${NAME}
All resolved variables are injected into every task environment by default.
Why sflow Variables vs Regular Environment Variables?
sflow variables offer significant advantages over traditional environment variables:
| Feature | sflow Variables ${{ }} | Shell Env Vars ${} |
|---|---|---|
| Resolution time | Before workflow execution (plan time) | At runtime (shell expansion) |
| Scope | Across entire YAML (backends, operators, resources, scripts) | Only within shell scripts |
| Dynamic values | Can reference backends, artifacts, tasks, other variables | Static values only |
| Type safety | Supports int, float, bool, string, list with validation | Strings only |
| Override | --set VAR=value from CLI | Requires manual export |
| Visibility | Shown in dry-run plan | Hidden until execution |
Key Benefits
-
Cross-section references: Use the same value in backends, operators, and scripts without duplication
variables:
NUM_GPUS:
value: 4
backends:
- name: cluster
gpus_per_node: ${{ NUM_GPUS }} # Used in backend config
workflow:
tasks:
- name: train
resources:
gpus:
count: ${{ NUM_GPUS }} # Used in resource allocation
script:
- echo "Training with ${NUM_GPUS} GPUs" # Used in script -
Dynamic resolution: Access runtime information like node IPs and task assignments
workflow:
variables:
HEAD_NODE:
value: "${{ backends.cluster.nodes[0].ip_address }}" # Resolved after allocation
tasks:
- name: worker
script:
- echo "Connecting to ${{ task.server.nodes[0].ip_address }}" # Task-aware -
Parameter sweeps: Define domains for automated replica generation
variables:
BATCH_SIZE:
value: 32
domain: [32, 64, 128] # Creates 3 replicas automatically -
CLI overrides: Change values without editing YAML
sflow run -f workflow.yaml --set NUM_GPUS=8 --set 'BATCH_SIZE=[16,32,64]'
Available Expression Contexts
When using ${{ ... }} expressions, you have access to these contexts:
| Context | Example | Description |
|---|---|---|
variables | ${{ variables.MY_VAR }} | Global and workflow variables |
backends | ${{ backends.slurm_cluster.nodes[0].ip_address }} | Backend allocation info (nodes, IPs) |
artifacts | ${{ artifacts.model.path }} | Artifact paths |
workflow | ${{ workflow.name }} | Workflow metadata |
task | ${{ task.my_task.nodes[0].ip_address }} | Task-specific node and GPU info (scripts only) |
Backend Node Access
After Slurm allocation, you can access node information:
workflow:
variables:
HEAD_NODE_IP:
value: "${{ backends.slurm_cluster.nodes[0].ip_address }}"
SECOND_NODE:
value: "${{ backends.slurm_cluster.nodes[1].name }}"
Available node properties:
nodes[i].name- Hostname of the nodenodes[i].ip_address- IP address of the nodenodes[i].index- Index of the node in the allocationnodes[i].num_gpus- Number of GPUs on the node
Variable Shorthand
For convenience, variables can be accessed directly without the variables. prefix:
# Both are equivalent:
value: "${{ variables.MY_VAR }}"
value: "${{ MY_VAR }}"
Variable Domains in Expressions
When a variable declares a domain, the current value still renders normally, and the domain list is available as metadata:
variables:
CONCURRENCY:
value: 16
type: integer
domain: [1, 4, 16, 64]
workflow:
tasks:
- name: show_domain
script:
- echo "value=${{ variables.CONCURRENCY }}"
- echo "domain=${{ variables.CONCURRENCY.domain }}"
- echo "max=${{ variables.CONCURRENCY.domain | max }}"
This also works in places that resolve expressions before execution, including sflow compose --resolve and sflow batch -e/--sbatch-extra-args.
For replica sweeps, ${{ variables.CONCURRENCY }} resolves to each replica's row value while ${{ variables.CONCURRENCY.domain }} stays the full domain list for every replica.
Task Node and GPU Access (Scripts Only)
Inside task scripts, you can reference other tasks' assigned nodes and GPUs using the task context:
workflow:
tasks:
- name: prefill_server
resources:
gpus:
count: 2
nodes:
indices: [0]
script:
- echo "Starting prefill server"
- start_server
- name: decode_server
resources:
gpus:
count: 2
nodes:
indices: [1]
script:
# Reference prefill_server's node IP
- echo "Connecting to prefill at ${{ task.prefill_server.nodes[0].ip_address }}"
- start_decoder --prefill-host=${{ task.prefill_server.nodes[0].ip_address }}
depends_on:
- prefill_server
Available task properties:
task.<name>.nodes- List of nodes assigned to the tasktask.<name>.nodes[i].name- Hostname of the i-th assigned nodetask.<name>.nodes[i].ip_address- IP address of the i-th assigned nodetask.<name>.nodes[i].index- Index of the node within the task's assignmenttask.<name>.nodes[i].num_gpus- Number of GPUs on the nodetask.<name>.gpus- List of GPU indices assigned to the task (fromCUDA_VISIBLE_DEVICES)task.<name>.backend- Name of the backend used by the tasktask.<name>.operator- Name of the operator used by the task
sflow Reserved Environment Variables
sflow automatically injects these environment variables into every task script:
| Variable | Description | Example |
|---|---|---|
SFLOW_WORKSPACE_DIR | Workspace root directory | /home/user/project |
SFLOW_OUTPUT_DIR | Output root directory | /home/user/project/sflow_output |
SFLOW_WORKFLOW_OUTPUT_DIR | Workflow-specific output directory | sflow_output/12345-wf-20260315-abcdef |
SFLOW_TASK_OUTPUT_DIR | Task-specific output directory | sflow_output/12345-wf-20260315-abcdef/task_name |
SFLOW_REPLICA_INDEX | Replica index (0-based) for replicated tasks | 0, 1, 2 |
SFLOW_TASK_ASSIGNED_NODE_NAMES | Comma-separated hostnames assigned to this task | node0,node1 |
SFLOW_TASK_ASSIGNED_NODE_IPS | Comma-separated IPs assigned to this task | 10.0.0.1,10.0.0.2 |
In addition, all user-defined variables are available as environment variables by their name (e.g. ${SLURM_NODES}, ${MODEL_NAME}).
Output directory variables
Use these to write output files to the correct location:
script:
- echo "results" > ${SFLOW_TASK_OUTPUT_DIR}/results.txt
- cp model.pt ${SFLOW_WORKFLOW_OUTPUT_DIR}/final_model.pt
Replica index
For replicated tasks, SFLOW_REPLICA_INDEX identifies which replica is running (0-based). Use it to differentiate replicas:
tasks:
- name: server
replicas:
count: 3
policy: parallel
script:
- echo "I am replica ${SFLOW_REPLICA_INDEX}"
- export MY_PORT=$((8000 + ${SFLOW_REPLICA_INDEX}))
- start_server --port ${MY_PORT}
When a task uses replicas.variables for domain sweeps, the sweep variable values are also injected as env vars:
tasks:
- name: benchmark
replicas:
variables:
- CONCURRENCY # each value from domain: [64, 128, 256]
policy: sequential
script:
- echo "Running with concurrency=${CONCURRENCY}"
- benchmark --concurrency ${CONCURRENCY}
Task-Assigned Node Environment Variables
Each task automatically receives environment variables with its assigned node information:
| Variable | Description | Example |
|---|---|---|
SFLOW_TASK_ASSIGNED_NODE_NAMES | Comma-separated list of assigned node hostnames | node0,node1 |
SFLOW_TASK_ASSIGNED_NODE_IPS | Comma-separated list of assigned node IP addresses | 10.0.0.1,10.0.0.2 |
These are useful in scripts when you need to iterate over assigned nodes or use them for distributed computing:
workflow:
tasks:
- name: distributed_train
resources:
nodes:
count: 2
gpus:
count: 8 # 4 per node
script:
- echo "My nodes: ${SFLOW_TASK_ASSIGNED_NODE_NAMES}"
- echo "My IPs: ${SFLOW_TASK_ASSIGNED_NODE_IPS}"
# Use in distributed training
- torchrun --nnodes=2 --node_rank=${SLURM_NODEID} \
--master_addr=$(echo ${SFLOW_TASK_ASSIGNED_NODE_IPS} | cut -d',' -f1) \
train.py
For replicated tasks, each replica gets its own assigned nodes:
worker_0might haveSFLOW_TASK_ASSIGNED_NODE_NAMES=n1andSFLOW_TASK_ASSIGNED_NODE_IPS=10.0.0.1worker_1might haveSFLOW_TASK_ASSIGNED_NODE_NAMES=n2andSFLOW_TASK_ASSIGNED_NODE_IPS=10.0.0.2
Accessing Replicated Tasks
For tasks with replicas, you can access each replica in two ways:
- By full replica name:
${{ task.prefill_server_0.nodes[0].ip_address }} - By base name with index:
${{ task.prefill_server[0].nodes[0].ip_address }}
Example with replicated tasks:
workflow:
tasks:
- name: prefill_server
replicas:
count: 2
policy: parallel
resources:
gpus:
count: 2
nodes:
count: 1
script:
- echo "Starting prefill server"
- name: client
script:
# Access each replica by index
- echo "Prefill 0 IP: ${{ task.prefill_server[0].nodes[0].ip_address }}"
- echo "Prefill 1 IP: ${{ task.prefill_server[1].nodes[0].ip_address }}"
- echo "Prefill 0 GPUs: ${{ task.prefill_server[0].gpus }}"
- echo "Prefill 1 GPUs: ${{ task.prefill_server[1].gpus }}"
# Or by full replica name
- echo "Full name access: ${{ task.prefill_server_0.nodes[0].ip_address }}"
depends_on:
- prefill_server
Note: The task context is only available in task scripts, not in global/workflow variables or other configuration sections. This is because task resource assignments are computed after variable resolution.
Minimal example
Minimal example:
version: "0.1"
variables:
- name: MSG
type: string
value: hello
workflow:
name: wf
tasks:
- name: t1
script:
- echo "jinja=${{ variables.MSG }}" > ${SFLOW_WORKFLOW_OUTPUT_DIR}/msg.txt
- echo "shell=${MSG}" >> ${SFLOW_WORKFLOW_OUTPUT_DIR}/msg.txt
Declare variables (dict vs list)
You can write variables as a dict (recommended) or list (both are supported).
Dict form:
variables:
SLURM_PARTITION:
description: "Slurm partition"
type: string
value: debug
List form:
variables:
- name: SLURM_PARTITION
description: "Slurm partition"
type: string
value: debug
Override variables at runtime (--set)
sflow run --file sflow.yaml --set SLURM_PARTITION=debug --set NUM_GPUS=4
Notes:
--setcan only override variables that already exist invariables:(otherwise it errors).- Values use simple type inference (int/float/bool/list/string).
- JSON-style list values update the variable
domain; the variablevaluebecomes the first element of the list.
Override Domains for Replica Sweeps
When you provide a JSON list as the value, it sets the variable's domain (used for replica sweeps):
# Override domain for parameter sweep
sflow run --file workflow.yaml --set 'CONCURRENCY=[16,32,64,128]'
# Multiple domain overrides
sflow run --file workflow.yaml --set 'BATCH_SIZE=[32,64]' --set 'LR=[0.001,0.01]'
This is equivalent to modifying the YAML:
variables:
CONCURRENCY:
value: 16
domain: [16, 32, 64, 128] # Set by --set 'CONCURRENCY=[16,32,64,128]'
When a list is provided:
- The
domainfield is set to the list - The
valueis set to the first element of the list
Replicas + variables (a slightly deeper step)
Example with replicas and variables:
version: "0.1"
variables:
- name: GPU_COUNT
type: integer
value: 2
- name: REPLICA_COUNT
type: integer
value: 3
workflow:
name: wf
tasks:
- name: t1
script:
- echo "hello from replica ${SFLOW_REPLICA_INDEX}"
- echo "I have ${GPU_COUNT} GPUs"
replicas:
count: "${{ variables.REPLICA_COUNT }}"
policy: parallel
resources:
gpus:
count: "${{ variables.GPU_COUNT }}"
Chained (recursive) variable resolution
Variables can reference other computed variables. The resolver iterates multiple passes until all resolvable variables are fully resolved.
variables:
AGG_TP_SIZE:
type: integer
value: 4
AGG_DP_SIZE:
type: integer
value: 1
AGG_PP_SIZE:
type: integer
value: 1
GPUS_PER_NODE:
type: integer
value: 8
# Computed from TP * DP * PP
AGG_GPUS_PER_WORKER:
type: integer
value: ${{ variables.AGG_TP_SIZE * variables.AGG_DP_SIZE * variables.AGG_PP_SIZE }}
# References AGG_GPUS_PER_WORKER (chained)
AGG_NODES_PER_WORKER:
type: integer
value: ${{ [variables.AGG_GPUS_PER_WORKER // variables.GPUS_PER_NODE, 1] | max }}
In this example:
AGG_GPUS_PER_WORKERis computed fromTP * DP * PP = 4AGG_NODES_PER_WORKERreferencesAGG_GPUS_PER_WORKER(chained) and computesmax(4 // 8, 1) = 1
The resolver handles this by:
- Only including fully-resolved variables in the evaluation context for each pass
- Retrying unresolved variables on subsequent passes until all dependencies are satisfied
- Variables with
type: integerare automatically cast to integers after resolution (important for arithmetic in chained expressions)
Tips for computed variables
- Always declare
type: integeron variables used in arithmetic expressions. Without it, values are treated as strings and arithmetic will fail. - Computed variables are available as environment variables in task scripts (e.g.
${AGG_GPUS_PER_WORKER}), eliminating the need for inline calculations in bash. - When using
sflow compose --resolve, computed variables are resolved to their literal values and removed from the output.
The --resolve flag and replica variables
When you use sflow compose --resolve or sflow batch --bulk-input --resolve, all resolvable variables are inlined to literal values and removed from the variables section. However, variables referenced by replicas.variables (sweep variables) are never resolved, even with --resolve. This is intentional:
variables:
CONCURRENCY:
value: 64
domain: [64, 128, 256]
workflow:
tasks:
- name: benchmark
replicas:
variables:
- CONCURRENCY # sweep over domain values
policy: sequential
script:
- benchmark --concurrency ${CONCURRENCY}
After --resolve, CONCURRENCY stays in the variables section because:
- Its value changes per replica (each replica gets a different domain value)
- Resolving it would collapse the sweep into a single value, losing the scalability
Variables referenced by replicas.count expressions are also preserved. For example:
variables:
NUM_CTX_SERVERS:
type: integer
value: 2
workflow:
tasks:
- name: prefill_server
replicas:
count: ${{ variables.NUM_CTX_SERVERS }}
policy: parallel
Here NUM_CTX_SERVERS is kept after --resolve because it controls how many replicas are created. Resolving it would make the config less flexible -- you wouldn't be able to override it with --set NUM_CTX_SERVERS=4 at run time.
This ensures the composed config remains a valid, scalable workflow template even after resolution.
Similarly, variables that depend on runtime contexts (e.g. ${{ backends.slurm_cluster.nodes[0].ip_address }}) cannot be resolved at compose time and are kept as expressions.