Resources#
A resource spec defines the number and types of resources required to run the task. The following fields are used to describe a resource spec:
Field |
Description |
|---|---|
|
Specify the amount of cores to request |
|
Specify the amount of memory (RAM) to use. |
|
Specify the amount of disk space to use. |
|
Specify the amount of GPUs to request |
|
Specify the platform to target. If no platform is specified, the default platform for the pool is used if the admins have specified a default platform. Learn more at Pool List. |
|
Specify the nodes to exclude from the resource spec. |
Note
The default resource spec can be configured but requires service-level configuration. If you have administrative access, you can enable this directly. Otherwise, contact someone with pool administration privileges.
Multiple resource specs can be defined in the same workflow and assigned individually to tasks.
To define a resource spec in the workflow, use the resources field under workflow. To
assign the resource to each task, use the resource field under tasks:
workflow:
name: my_workflow
resources:
default: # (1)
cpu: 1
memory: 16Gi
storage: 1Gi
platform: ovx-a40
x86_gpu: # (2)
cpu: 4
gpu: 1
memory: 16Gi
storage: 1Gi
platform: dgx-a100
tasks:
- name: task1
resource: default # (3)
...
- name: task2
resource: x86_gpu # (4)
...
- name: task3 # (5)
...
Defines the
defaultresource spec which targets an A40 node.Defines the
x86_gpuresource spec uses a single GPU which targets an A100 node.Assigns the
defaultresource to task1.Assigns the
x86_gpuresource to task2.Since
task3does not define a resource, thedefaultresource spec will be used.
If the resource field is left blank, the default resources are used.
Follow the Resource List CLI for available resources before building the resource spec.
If there are some node with poor performance or network in the pool, you can exclude them using the
nodesExcluded field in the resource spec:
resources:
default:
cpu: 1
memory: 16Gi
storage: 1Gi
nodesExcluded:
- worker1
- worker2
Warning
Excluding too many nodes can lead to the tasks stuck in PENDING forever!