Overview#
Workflows turn complex computational pipelines into simple YAML definitions. You define what to run, how tasks connect, and what resources they need. OSMO handles the rest - scheduling, orchestration, and execution across your compute infrastructure.
What is a Workflow?#
Important
A workflow is a user-defined, directed acyclic graph (DAG) of tasks that is scheduled and executed by OSMO.
Key characteristics:
Workflows are defined in YAML and submitted via CLI or Web UI.
Tasks execute based on defined dependencies
Support serial, parallel, and combined execution patterns
Scheduled automatically by OSMO
Workflow Example
workflow:
name: ml-pipeline
tasks:
- name: preprocess
image: python:3.10
command: ["python"]
args: ["preprocess.py"]
...
- name: train
image: pytorch/pytorch
command: ["python"]
args: ["train.py"]
...
inputs:
- task: preprocess # (1)
- name: evaluate
image: python:3.10
command: ["python"]
args: ["evaluate.py"]
...
inputs:
- task: train # (2)
- name: export-onnx
image: python:3.10
command: ["python"]
args:
- "export.py"
- "--format=onnx"
...
inputs:
- task: train # (2)
The
taskinput specifies the upstream task dependency.Both
evaluateandexport-onnxdepend only ontrain, so they run in parallel.
What is a Task?#
Important
Tasks are the fundamental units of work in OSMO. A task is an independent environment that runs a list of commands within a Docker container.
Capabilities:
π Access local files, upstream task, or cloud storage
π» Develop interactively with VSCode, Jupyter, or SSH
π Use managed secrets for secure credential access
π₯οΈ Request specific hardware (GPU, CPU, RAM)
π Configure automatic retries for failures
And much more!
Example train task from the above workflow
- name: train
image: pytorch/pytorch:2.0-cuda11.8
# Task dependencies
inputs:
- task: preprocess
# Secrets
credentials:
wandb_cred:
WANDB_API_KEY: wandb_api_key # (1)
# Execution
command: ["python"]
args:
- "train.py"
- "--data=/workspace/data"
- "--checkpoint=/workspace/ckpt/base.pth"
- "--output={{output}}" # (2)
# Task outputs
outputs: # (3)
- url: s3://my-bucket/models/
Use secrets for secure credential management
Writes to an output directory that is recognized by OSMO for further processing
Uploads the output directory to S3 after completion
What is a Group?#
Important
A group is a collection of tasks that are executed together. It synchronizes the execution of multiple tasks, enabling them to communicate within the same network.
Caution
groups and tasks fields are mutually exclusive in a workflow.
How groups work:
A single task in a group is designated as the group leader
All tasks in a group start together
Tasks can communicate over the network
Tasks may run on the same node or across different nodes
Supports both homogeneous (e.g., all x86_64) and heterogeneous (e.g., x86_64 + ARM64) architectures
Common patterns:
Distributed training - Multiple workers with parameter servers
Multi-stage pipelines - Tasks that need real-time coordination
Service architectures - Long-running services with dependent workers
Groups Example
workflow:
name: my_workflow
groups:
################################################
# Group 1 (runs first)
################################################
- name: group_1
tasks:
- name: task_1
lead: true # (1)
...
- name: task_2
...
outputs:
- dataset:
name: dataset_3 # (2)
- name: task_3
...
################################################
# Group 2 (runs after group 1)
################################################
- name: group_2
tasks:
- name: task_4
lead: true
...
inputs:
- dataset:
name: dataset_3 # (3)
################################################
# Group 3 (runs after group 1)
################################################
- name: group_3
tasks:
- name: task_5
lead: true
...
inputs:
- dataset:
name: dataset_3
- name: task_6 # (4)
...
Every group must have one and only one lead task.
task_2outputsdataset_3which is used as an input for other groups.group_2runs aftergroup_1because of the dependency ondataset_3.Despite not having a direct dependency on
dataset_3,task_6βs peer task (task_5) depends ondataset_3.Therefore,
group_3must run aftergroup_1.
See also
See here for the full workflow specification.