Overview#

Resources#

A resource refers to the machine which is used to run a workflow. These resources are grouped into pools and platforms so that you can share resources between other users and specify what type of hardware you want to run your workflow on.

The diagram below illustrates the organizational hierarchy in an OSMO cluster. Click on pools or platforms to learn more about each layer.

../../_images/pool_organization.svg

The following sections explain each layer (i.e. pools and platforms) in detail.

Pools#

A pool is a group of resources that are shared between users which contains platforms to differentiate between different types of hardware. These pools are access controlled to enable different teams to share resources.

How do pools manage workflow priority and preemption?

Depending on the scheduler on the backend, a pool can have a quota imposed to limit the number of HIGH or NORMAL priority workflows (see Scheduling).

LOW priority workflows can go beyond the pool quota by borrowing unused GPUs available in the cluster. However, LOW maybe subjected to preemption (see Borrowing).

Pool Statuses

Status

Description

ONLINE

The pool is ready to run workflows.

OFFLINE

Workflows can be submitted to the pool, but will be queued until the pool is online.

MAINTENANCE

The pool is undergoing maintenance. You won’t be able to submit workflows to the pool.

Note

Please contact your administrator for more information on pools under maintenance.

Resource Types

Type

Description

SHARED

The resource is shared with another pool.

RESERVED

The resource is only available to the pool.

To view the pools, you can use Pool List.

To view the available resources in a pool, you can use Resource List.

Platforms#

A platform is a group of resources in a pool and denotes a specific type of hardware.

Resources are already assigned to a platform by the administrator. You can view more information about the resource and its access configurations using Resource Info.

Platform Access Configurations

Configuration

Type

Description

Privileged Mode Allowed

boolean

Whether the platform allows privileged containers. If enabled, you can set privileged to true in the workflow spec (see Task).

Host Network Allowed

boolean

Whether the platform allows host networking. If enabled, you can set hostNetwork to true in the workflow spec (see Task).

Default Mounts

list[string]

Default volume mounts from the node to the task container for the platform.

Allowed Mounts

list[string]

Volume mounts that are allowed for the platform. These are not mounted by default. You may add these to volumeMounts in the workflow spec (see Task).

Important

When you are submitting a workflow, you will need to specify a platform to target in the workflow resource spec. Learn more at Resources.