Pod Templates#

Pod templates define how workflow tasks execute as Kubernetes pods. After configuring pools and resource validation, create pod templates to specify scheduling constraints, security policies, and resource allocations that apply across your pools.

Why Use Pod Templates?#

Pod templates provide standardized configurations that simplify cluster management:

✓ Target Specific Hardware: Use node selectors and tolerations to route workflows to the right GPU types, CPU architectures, or instance types.
✓ Enforce Security Policies: Apply consistent security contexts, capabilities, and access controls across all workflow tasks.
✓ Optimize Resource Allocation: Set appropriate resource requests and limits with conditional logic based on workflow requirements.
✓ Simplify User Experience: Users select pools without needing to understand complex Kubernetes scheduling—templates handle all the details.

How It Works#

Template Application Flow#

1. Define Templates 📋

Create reusable specs

Node selectors, tolerations, resrcs

2. Reference in Pools 🔗

Attach to pools

Multiple templates per pool

3. Merge Templates 🔄

Combine specifications

Later templates override earlier

4. Create K8s Pods ✅

Build Kubernetes pods

Apply to workflow tasks

Template Structure#

Pod templates use the standard Kubernetes PodSpec format with OSMO enhancements:

{
  "template_name": {
    "spec": {
      "nodeSelector": {
        "node-label": "value"
      },
      "tolerations": [
        {
          "key": "taint-key",
          "effect": "NoSchedule"
        }
      ],
      "containers": [
        {
          "name": "{{USER_CONTAINER_NAME}}",
          "resources": {
            "limits": {
              "cpu": "{{USER_CPU}}",
              "memory": "{{USER_MEMORY}}"
            }
          }
        }
      ]
    }
  }
}

Key Features#

Variable Substitution: Use {{USER_CPU}}, {{WF_ID}}, etc. are resolved at runtime
Template Merging: Combine multiple templates; later ones override earlier ones
Conditional Logic: Use Jinja2 expressions for dynamic values (For example, to accept all user requests of CPU > 2 else override to 2, use {% if USER_CPU > 2 %}2{% else %}{{USER_CPU}}{% endif %})

Warning

Merge Behavior

Fields are overridden by your templates
Lists are merged by name field (same name = recursive merge, different name = append)
Templates are applied in order (later overrides earlier)

Note

For detailed configuration fields and all available variables, see /api/configs/pod_template in the API reference.

Practical Guide#

Standard Pod Templates#

Create templates that target specific hardware and handle Kubernetes scheduling constraints.

Step 1: Understanding Template Variables

Step 2: Template Configuration File

Create a configuration file with base templates for architecture, control container, and user container:

$ cat << EOF > pod_templates.json
{
  # Target specific architecture
  "default_amd64": {
    "spec": {
      "nodeSelector": {"kubernetes.io/arch": "amd64"}
    }
  },
  "default_ctrl": {
    "spec": {
      # Control container
      "containers": [{
        "name": "osmo-ctrl",
        "resources": {
          # Use user specified resources as limits
          "limits": {
            "cpu": "{{USER_CPU}}",
            "memory": "{{USER_MEMORY}}",
            "ephemeral-storage": "{{USER_STORAGE}}"
          },
          # Use a default value of 2 if user requests are less than 2
          "requests": {
            "cpu": "{% if USER_CPU > 2 %}2{% else %}{{USER_CPU}}{% endif %}",
            "memory": "1Gi",
            "ephemeral-storage": "4Gi"
          }
        }
      }]
    }
  },
  "default_user": {
    "spec": {
      # User container
      "containers": [{
        "name": "{{USER_CONTAINER_NAME}}",
        # Use user specified resources for requests and limits
        "resources": {
          "limits": {
            "cpu": "{{USER_CPU}}",
            "memory": "{{USER_MEMORY}}",
            "nvidia.com/gpu": "{{USER_GPU}}",
            "ephemeral-storage": "{{USER_STORAGE}}"
          },
          "requests": {
            "cpu": "{{USER_CPU}}",
            "memory": "{{USER_MEMORY}}",
            "nvidia.com/gpu": "{{USER_GPU}}",
            "ephemeral-storage": "{{USER_STORAGE}}"
          }
        }
      }]
    }
  }
}
EOF

$ osmo config update POD_TEMPLATE --file pod_templates.json

Step 3: Reference Templates in Pools

Add templates to your pool’s common_pod_template field:

{
  "my-pool": {
    "backend": "default",
    "common_pod_template": [
      "default_amd64",
      "default_ctrl",
      "default_user"
    ]
  }
}

Additional Examples#

Troubleshooting#

Template Not Found

Verify template name matches exactly in pool configuration
Check template exists: osmo config get POD_TEMPLATE <template_name>

Variable Substitution Errors

Ensure all variables used are valid OSMO variables
Check for typos in variable names (case-sensitive)
Review logs for specific variable resolution errors

Resource Constraints

Verify resource requests match available node capacity
Check nodeSelector labels exist on cluster nodes
Ensure tolerations match node taints

Debugging Tips

Start with simple templates and add complexity gradually
Validate JSON syntax before applying
Test with different workflow configurations
Review OSMO service logs for detailed errors

Tip

Best Practices

Use descriptive template names (e.g., gpu_h100_training, cpu_inference)
Create modular templates for reusability across different pools (Example: architecture, security, resources)
Use conditional logic to optimize resource requests
Add labels and annotations for monitoring
Test templates thoroughly before production use

Warning

Do not override image, command, or args fields in containers — OSMO manages these internally.
Template changes only apply to new workflows and NOT running workflow tasks