Pod Templates#
Pod templates define how workflow tasks execute as Kubernetes pods. After configuring pools and resource validation, create pod templates to specify scheduling constraints, security policies, and resource allocations that apply across your pools.
Why Use Pod Templates?#
Pod templates provide standardized configurations that simplify cluster management:
- ✓ Target Specific Hardware
Use node selectors and tolerations to route workflows to the right GPU types, CPU architectures, or instance types.
- ✓ Enforce Security Policies
Apply consistent security contexts, capabilities, and access controls across all workflow tasks.
- ✓ Optimize Resource Allocation
Set appropriate resource requests and limits with conditional logic based on workflow requirements.
- ✓ Simplify User Experience
Users select pools without needing to understand complex Kubernetes scheduling—templates handle all the details.
How It Works#
Template Application Flow#
1. Define Templates 📋
Create reusable specs
2. Reference in Pools 🔗
Attach to pools
3. Merge Templates 🔄
Combine specifications
4. Create K8s Pods ✅
Build Kubernetes pods
Template Structure#
Pod templates use the standard Kubernetes PodSpec format with OSMO enhancements:
template_name:
spec:
nodeSelector:
node-label: value
tolerations:
- key: taint-key
effect: NoSchedule
containers:
- name: '{{USER_CONTAINER_NAME}}'
resources:
limits:
cpu: '{{USER_CPU}}'
memory: '{{USER_MEMORY}}'
Key Features#
Variable Substitution: Use
{{USER_CPU}},{{WF_ID}}, etc. are resolved at runtimeTemplate Merging: Combine multiple templates; later ones override earlier ones
Conditional Logic: Use Jinja2 expressions for dynamic values (For example, to accept all user requests of CPU > 2 else override to 2, use
{% if USER_CPU > 2 %}2{% else %}{{USER_CPU}}{% endif %})
Warning
Merge Behavior
Fields are overridden by your templates
Lists are merged by
namefield (same name = recursive merge, different name = append)Templates are applied in order (later overrides earlier)
Note
For detailed configuration fields and all available variables, see /api/configs/pod_template in the API reference.
Base Pod Specification Details
OSMO creates a base pod spec with three containers (osmo-init, osmo-ctrl, user container). Your templates are merged on top of it.
apiVersion: v1
kind: Pod
metadata:
labels:
osmo.workflow_id: <workflow name>
osmo.submitted_by: <user name>
spec:
containers:
- name: {{USER_CONTAINER_NAME}} # Your code runs here
command: ["/osmo/bin/osmo_exec"]
- name: osmo-ctrl # Manages data transfer
initContainers:
- name: osmo-init # Sets up environment
Practical Guide#
📄 Edit in your Helm values file
Everything on this section is to be added in your Helm values file under services.configs.
Apply changes with helm upgrade.
Standard Pod Templates#
Create templates that target specific hardware and handle Kubernetes scheduling constraints.
Step 1: Understanding Template Variables
Special Variables
- Resource Variables:
{{USER_CPU}}- CPU count{{USER_GPU}}- GPU count{{USER_MEMORY}}- Memory (e.g., “8Gi”){{USER_STORAGE}}- Storage (e.g., “100Gi”){{USER_CONTAINER_NAME}}- Name of user container
- Workflow Variables:
{{WF_ID}}- Workflow name/ID{{WF_UUID}}- Unique workflow ID{{WF_TASK_NAME}}- Task name{{WF_SUBMITTED_BY}}- Username{{WF_POOL}}- Pool name{{WF_PLATFORM}}- Platform name
- Conditional Logic:
Use Jinja2:
{% if USER_CPU > 2 %}2{% else %}{{USER_CPU}}{% endif %}
Step 2: Define Pod Templates in Helm Values
Add base templates for architecture, control container, and user container under services.configs.podTemplates:
services:
configs:
enabled: true
podTemplates:
# Target specific architecture
default_amd64:
spec:
nodeSelector:
kubernetes.io/arch: amd64
# Control container
default_ctrl:
spec:
containers:
- name: osmo-ctrl
resources:
# Use user specified resources as limits
limits:
cpu: '{{USER_CPU}}'
memory: '{{USER_MEMORY}}'
ephemeral-storage: '{{USER_STORAGE}}'
# Cap ctrl container at 2 CPUs if user requests more
requests:
cpu: '{% if USER_CPU > 2 %}2{% else %}{{USER_CPU}}{% endif %}'
memory: 1Gi
ephemeral-storage: 4Gi
# User container
default_user:
spec:
containers:
- name: '{{USER_CONTAINER_NAME}}'
resources:
limits:
cpu: '{{USER_CPU}}'
memory: '{{USER_MEMORY}}'
nvidia.com/gpu: '{{USER_GPU}}'
ephemeral-storage: '{{USER_STORAGE}}'
requests:
cpu: '{{USER_CPU}}'
memory: '{{USER_MEMORY}}'
nvidia.com/gpu: '{{USER_GPU}}'
ephemeral-storage: '{{USER_STORAGE}}'
Step 3: Reference Templates in Pools
Add templates to your pool’s common_pod_template field:
services:
configs:
pools:
my-pool:
backend: default
common_pod_template:
- default_amd64
- default_ctrl
- default_user
Step 4: Apply
helm upgrade osmo deployments/charts/service -f my-values.yaml
Additional Examples#
GPU-Specific Templates - Target Specific GPU Types
Create templates for different GPU hardware (H100, L40, T4):
services:
configs:
podTemplates:
training_h100:
spec:
nodeSelector:
nvidia.com/gpu.product: NVIDIA-H100
tolerations:
- key: training-dedicated
value: h100
effect: NoSchedule
simulation_l40:
spec:
nodeSelector:
nvidia.com/gpu.product: NVIDIA-L40
tolerations:
- key: simulation-dedicated
value: l40
effect: NoSchedule
CPU Instance Types - Target Specific Instance Classes
Target CPU-optimized instances:
services:
configs:
podTemplates:
cpu_compute:
spec:
nodeSelector:
node.kubernetes.io/instance-type: c5.4xlarge
Security Templates - Apply Security Contexts
Enforce security policies:
services:
configs:
podTemplates:
secure_workload:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: '{{USER_CONTAINER_NAME}}'
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: [ALL]
Node Exclusion - Exclude Specific Nodes
Use node affinity to exclude nodes from user requests. This rule can be used to avoid gpu fragmentation with in the cluster by satisfying user requests on the same node, before the scheduler chooses other nodes to schedule tasks.
services:
configs:
podTemplates:
node_exclusion:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: NotIn
values: '{{USER_EXCLUDED_NODES}}'
Shared Memory - Add /dev/shm Volume
Add shared memory for workflows requiring IPC (Example: TensorRT, PyTorch, etc.)
services:
configs:
podTemplates:
shared_memory:
spec:
containers:
- name: '{{USER_CONTAINER_NAME}}'
volumeMounts:
- name: shm
mountPath: /dev/shm
volumes:
- name: shm
emptyDir:
medium: Memory
sizeLimit: 1Gi
Troubleshooting#
- Template Not Found
Verify template name matches exactly in pool configuration
Check template exists:
osmo config get POD_TEMPLATE <template_name>
- Variable Substitution Errors
Ensure all variables used are valid OSMO variables
Check for typos in variable names (case-sensitive)
Review logs for specific variable resolution errors
- Resource Constraints
Verify resource requests match available node capacity
Check nodeSelector labels exist on cluster nodes
Ensure tolerations match node taints
- Debugging Tips
Start with simple templates and add complexity gradually
Validate YAML syntax before applying
Test with different workflow configurations
Review OSMO service logs for detailed errors
Tip
Best Practices
Use descriptive template names (e.g.,
gpu_h100_training,cpu_inference)Create modular templates for reusability across different pools (Example: architecture, security, resources)
Use conditional logic to optimize resource requests
Add labels and annotations for monitoring
Test templates thoroughly before production use
Warning
Do not override
image,command, orargsfields in containers — OSMO manages these internally.Template changes only apply to new workflows and NOT running workflow tasks