Scheduling#

Overview#

OSMO’s scheduling system maximizes cluster utilization while ensuring fair resource allocation across teams and projects. The scheduler operates on three key principles:

Priority-Based Queuing

Workflows are scheduled based on their priority level (HIGH, NORMAL, LOW), ensuring critical tasks get resources first.

Smart Preemption

Low-priority workflows can be interrupted to make room for higher-priority tasks, with automatic rescheduling.

GPU Borrowing

Unused GPUs from other pools can be borrowed to maximize utilization and reduce wait times.

Priority#

Workflows can be assigned one of three priority levels:

Priority	Preemptible	May Borrow GPUs	When To Use
`HIGH`	No	No	For time-critical workflows that need to skip the queue.
`NORMAL`	No	No	For most standard workflows.
`LOW`	Yes	Yes	Batch jobs that can handle being interrupted and restarted. These can be scheduled before `HIGH` and `NORMAL` priority workflows because they can borrow GPUs from other pools (see Borrowing).

The scheduler will always try to schedule higher priority workflows before lower priority workflows.

For workflows with the same priority level, workflows are scheduled in the order they are submitted.

See also

To learn how to specify priority in your workflow, see submit.

Quotas#

Each pool has a quota of GPUs that can be occupied by NORMAL and HIGH priority workflows. Once the pool’s GPU quota is reached, workflows submitted with NORMAL or HIGH priority will be queued.

LOW priority workflows can be executed even when the pool has hit its GPU quota via Borrowing.

Important

LOW priority workflows do not count towards the pool’s GPU quota.

See also

To learn more about how to see your pool’s quota, see Pool CLI Reference and Resource CLI Reference.

Preemption#

Preemption within a pool is when a higher priority workflow (NORMAL or HIGH) evicts a lower priority workflow (LOW) to make room for it to start running.

Preemption will happen if the following conditions are met:

The pool has NOT reached its GPU quota (from NORMAL and HIGH priority workflows)
There are existing LOW priority workflows consuming the pool’s GPUs
A higher priority workflow (NORMAL or HIGH) is submitted to the pool

This will result in LOW priority workflows running in the pool to be preempted to make room for the higher priority workflow.

See also

Preemption outside of a pool may occur when borrowed resources are reclaimed by other pools. See Borrowing for more information.

Important

Key Characteristic:

A preempted workflow will fail with the FAILED_PREEMPTED status.
A preempted workflow will be rescheduled automatically by default.
Preemption allows you to submit as many LOW priority workflows as you want to keep the cluster busy without needing to worry about blocking other workflows.

Borrowing#

Multiple pools can share the same physical GPUs in the compute cluster. Administrators can configure the partitioning of the GPUs between the pools through quotas.

Borrowing allows you to run more workflows even if the total GPUs used have reached the pool’s GPU quota. OSMO will automatically borrow GPUs from other pools that are sharing the same GPUs.

Important

LOW priority workflows are the only priority level that can go beyond the pool quota by borrowing GPUs from other pools with the risk of being preempted.

If the pool is under its quota limit, the LOW priority workflows will NOT be preempted by other pools.

See also

For workloads that require specific network locality (e.g., NVLink multi-node training), see Topology-Aware Scheduling for topology-aware scheduling.