Working with Data#
OSMO makes it easy to upload and download data for your workflows. This tutorial will cover:
How data is used inside a workflow.
How to work with storage URLs
Prerequisites
Before you start, please make sure you have configured your data credentials. See Data for more details.
Hint
The examples below demonstrate reading and writing from remote storage. Please replace any URLs with your own storage URLs.
Inside a Workflow#
OSMO provides two directories for data management in every task:
/osmo/
├── input/ ← Read input data here
│ ├── 0/
│ └── 1/
└── output/ ← Write results here
└── (user outputs)
How it works:
Before task starts → OSMO downloads data specified in
inputs:to/osmo/input/During task execution → Your code reads from
{{input:#}}/After task completes → OSMO uploads
/osmo/output/to locations specified inoutputs:
Example:
tasks:
- name: process
command: ["bash", "-c"]
args:
- |
cat {{input:0}}/data.txt # Reads the first input
echo "Result" > {{output}}/result.txt # Write output
inputs:
- url: s3://my-bucket/inputs/ # ← Downloads here
outputs:
- url: s3://my-bucket/outputs/ # ← Uploads here
See also
The above explains the fundamentals of how a workflow can read/write data. For more details on how data flows between tasks in a workflow, see Serial Workflows.
Storage URLs#
URL Patterns#
Storage Providers |
URL Pattern |
|---|---|
|
|
|
|
|
|
|
|
|
Uploading Data#
Upload data directly to cloud storage (S3, GCS, Azure) using URLs:
workflow:
name: upload-to-s3
tasks:
- name: save-to-cloud
image: ubuntu:24.04
command: ["bash", "-c"]
args:
- |
mkdir -p {{output}}/results
echo "Model checkpoint" > {{output}}/results/model.pth
echo "Upload complete"
outputs:
- url: s3://my-bucket/models/ # (1)
Files from
{{output}}are uploaded to the S3 bucket after task completion.
Downloading Data#
Download data directly from cloud storage using URLs:
workflow:
name: download-from-s3
tasks:
- name: load-from-cloud
image: ubuntu:24.04
command: ["bash", "-c"]
args:
- |
echo "Loading data from S3..."
ls -la {{input:0}}/ # (1)
echo "Download complete"
inputs:
- url: s3://my-bucket/data/ # (2)
Access downloaded files at
{{input:0}}/.Files are downloaded from S3 before the task starts.
Filtering Data#
Filter which files to download or upload using regex patterns:
workflow:
name: filtered-io
tasks:
- name: selective-download
image: ubuntu:24.04
command: ["bash", "-c"]
args: ["ls -la {{input:0}}/"]
inputs:
- dataset:
name: large_dataset
regex: .*\.txt$ # (1)
outputs:
- dataset:
name: output_dataset
regex: .*\.(json|yaml)$ # (2)
Only download
.txtfiles from the input.Only upload
.jsonand.yamlfiles to the output.
Next Steps#
Now that you understand data management, you’re ready to build more complex workflows. Continue to Serial Workflows to learn about task dependencies.