Checkpointing#
OSMO supports periodic data checkpointing from your workflow to data store.
The checkpoint field in the task spec can be used to specify the following parameters:
Field |
Description |
|---|---|
path (required) |
The local path within the task to checkpoint. |
url (required) |
The remote path in the data store to checkpoint to. |
frequency (required) |
How long to wait _between_ one checkpoint ending and the next one beginning.
This should be in the format of |
regex (optional) |
Regex for files to checkpoint. |
Note
Once the task is completed/failed, the checkpointing process will upload the data one more time to the data store.
workflow:
name: sample-group
tasks:
- name: task1
checkpoint:
- path: /local/path/to/checkpoint
url: s3://my-bucket/my-folder
frequency: 30m
regex: .*.json
You can view the checkpointing process as it runs in the workflow logs.
$ osmo workflow logs <workflow_id>
...
2025/04/22 23:57:13 [task1][osmo] Checkpointing data from /local/path/to/checkpoint to s3://my-bucket/my-folder...
2025/04/22 23:57:14 [task1][osmo] 100%| 5.00M/5.00M [00:00<00:00, 6.73MB/s, file_name=/local/path/to/checkpoint/file_1.json]
2025/04/22 23:57:14 [task1][osmo] 2025-04-22T23:57:14+0000 client [INFO] data: Data has been uploaded
2025/04/22 23:57:15 [task1][osmo] Checkpointing data from /local/path/to/checkpoint to s3://my-bucket/my-folder finished
...