Group Templates#
Group templates define arbitrary Kubernetes resources that OSMO creates alongside each workflow task group. Unlike pod templates, which configure the pod spec for individual tasks, group templates deploy namespace-scoped resources—such as ComputeDomains or ConfigMaps—that the group’s pods depend on. Resources are scoped to the namespace in which the workflow runs.
Why Use Group Templates?#
Group templates extend OSMO’s scheduling capabilities by allowing you to provision supporting Kubernetes resources alongside workflow task groups:
- ✓ Provision Shared Group Resources
Create ConfigMaps, Secrets, or other namespace-scoped resources that all tasks in a group share.
- ✓ Support Custom CRDs
Deploy any Kubernetes custom resource your backend requires without modifying OSMO’s core scheduling logic.
- ✓ Maintain Consistent Cleanup
OSMO records which resource types were created and cleans them up when the group finishes, regardless of pool config changes.
How It Works#
Resource Creation Flow#
1. Define Templates 📋
Create named Kubernetes manifests
2. Reference in Pools 🔗
Attach to pools
3. Merge Templates 🔄
Combine specifications
4. Create Resources ✅
Resources created before pods
Template Structure#
Group templates are full Kubernetes manifests. They must include apiVersion, kind, and metadata.name. The metadata.namespace field must be omitted—OSMO assigns the namespace at runtime.
{
"template_name": {
"apiVersion": "resource.nvidia.com/v1beta1",
"kind": "ComputeDomain",
"metadata": {
"name": "compute-domain-{{WF_GROUP_UUID}}"
},
"spec": {
"numNodes": 0,
"channel": {
"resourceClaimTemplate": {
"name": "compute-domain-{{WF_GROUP_UUID}}-rct"
}
}
}
}
}
Note
Resource Name Uniqueness
Because multiple workflows may run in the same pool and namespace at the same time, resource names must be unique per group. Include {{WF_GROUP_UUID}} in metadata.name to ensure each group gets its own resource instance, as shown in the example above.
Key Features#
Variable Substitution: The same variables available in pod templates are resolved at runtime, except task-specific variables (such as task name and task UUID) which are not available at the group level.
Label Injection: OSMO automatically adds its standard labels (
osmo.workflow_id,osmo.submitted_by, etc.) to each resource’smetadata.labelsfor tracking and cleanup.Template Merging: Multiple templates that define the same resource (same
apiVersion,kind, andmetadata.name) are merged, with later templates overriding earlier ones.Creation Order: Group template resources are created before the group’s pods, ensuring dependencies are satisfied.
Note
For detailed configuration fields and all available variables, see /api/configs/group_template in the API reference.
Required Backend Permissions#
OSMO’s backend worker creates and cleans up group template resources using its Kubernetes ServiceAccount. Before using a group template that creates a given resource kind, the backend operator must be granted permission for that kind in the workflow namespace.
The backend-operator Helm chart exposes a services.backendWorker.extraRBACRules values
field for this purpose (see Deploy Backend Operator). For each resource kind referenced in your
group templates, add a corresponding entry:
services:
backendWorker:
extraRBACRules:
# Vanilla Kubernetes resources
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["list", "create", "delete", "patch"]
# Example CRD — adjust apiGroups and resources for your CRD
- apiGroups: ["resource.nvidia.com"]
resources: ["computedomains"]
verbs: ["list", "create", "delete", "patch"]
Note
The verbs list must include at minimum create, delete, list, and patch.
OSMO requires list and patch for label injection and cleanup tracking in addition to
create and delete.
Warning
For CRDs, apiGroups must match the CRD’s group exactly. Run
kubectl get crd <crd-name> -o jsonpath='{.spec.group}' to find the correct value.
Practical Guide#
Configuring Group Templates to Enable NvLINK#
This example shows how to configure a ComputeDomain group template alongside a matching pod template to enable NvLINK communication across nodes for multi-node workloads.
Step 1: Create the Group Template
Create a JSON file defining the ComputeDomain resource. The resourceClaimTemplate.name is used in the pod template in the next step:
$ cat << EOF > group_templates.json
{
"compute-domain": {
"apiVersion": "resource.nvidia.com/v1beta1",
"kind": "ComputeDomain",
"metadata": {
"name": "compute-domain-{{WF_GROUP_UUID}}"
},
"spec": {
"numNodes": 0,
"channel": {
"resourceClaimTemplate": {
"name": "compute-domain-{{WF_GROUP_UUID}}-rct"
}
}
}
}
}
EOF
$ osmo config update GROUP_TEMPLATE --file group_templates.json
Step 2: Create the Pod Template
Create a pod template that claims the compute domain channel so that each task pod is connected to the NvLINK fabric:
$ cat << EOF > pod_templates.json
{
"use-compute-domain": {
"spec": {
"containers": [
{
"name": "{{USER_CONTAINER_NAME}}",
"resources": {
"claims": [
{
"name": "compute-domain-channel"
}
]
}
}
],
"resourceClaims": [
{
"name": "compute-domain-channel",
"resourceClaimTemplateName": "compute-domain-{{WF_GROUP_UUID}}-rct"
}
]
}
}
}
EOF
$ osmo config update POD_TEMPLATE --file pod_templates.json
Step 3: Reference Both Templates in the Pool
Add both templates to your pool configuration:
{
"my-pool": {
"backend": "default",
"common_pod_template": ["default_amd64", "default_user", "use-compute-domain"],
"common_group_templates": [
"compute-domain"
]
}
}
Step 4: Verify the Configuration
# List all group templates
$ osmo config get GROUP_TEMPLATE
# Show a specific group template
$ osmo config get GROUP_TEMPLATE compute-domain