Resource Validation#

After configuring pools, add resource validation rules to prevent workflows from requesting more resources than available on your nodes. Validation acts as a pre-flight check that rejects invalid requests before they reach the scheduler.

Why Use Resource Validation?#

Resource validation provides guardrails that protect your cluster and improve user experience:

✓ Prevent Scheduling Failures: Reject workflows that request more CPU, memory, or GPU than any node can provide, avoiding pods getting stuck in pending state.
✓ Catch Configuration Errors: Detect invalid resource specifications (negative values, zero allocations, incorrect units) before submission.
✓ Provide Clear Feedback: Give users immediate, actionable error messages explaining what’s wrong and how to fix it.
✓ Optimize Resource Utilization: Enforce safety margins and best practices for resource allocation across your cluster.

How It Works#

Validation Flow#

1. Submit Workflow 📝

User requests resources

CPU, memory, GPU, storage resources

2. Validate Rules ⚖️

Check against capacity

Compare with rule structure set by admin

3. Proceed or Reject ✓✗

Accept or deny

Submit for scheduling or show error message

Rule Structure#

Each validation rule has four components:

{
  "operator": "LE",
  "left_operand": "{{USER_CPU}}",
  "right_operand": "{{K8_CPU}}",
  "assert_message": "CPU {{USER_CPU}} exceeds node capacity {{K8_CPU}}"
}

operator: Comparison type (EQ, LT, LE, GT, GE)
left_operand: User-requested value (e.g., {{USER_CPU}})
right_operand: Limit or node capacity (e.g., {{K8_CPU}})
assert_message: Error shown when validation fails

Note

For detailed configuration fields and all available variables, see /api/configs/resource_validation in the API reference documentation.

Practical Guide#

Standard Validation Rules#

Create validation templates for common resources: CPU, GPU, memory, and storage.

Step 1: Create Validation Configuration

Define validation rules using variables for user requests ({{USER_*}}) and node capacity ({{K8_*}}):

Step 2: Apply Standard Validation Rules

Create a file with recommended validation templates:

$ cat << EOF > validation_config.json
{
  "default_cpu": [
    {
      "operator": "LE",
      "left_operand": "{{USER_CPU}}",
      "right_operand": "{{K8_CPU}}",
      "assert_message": "CPU {{USER_CPU}} exceeds node capacity {{K8_CPU}}"
    },
    {
      "operator": "GT",
      "left_operand": "{{USER_CPU}}",
      "right_operand": "0",
      "assert_message": "CPU {{USER_CPU}} must be greater than 0"
    }
  ],
  "default_gpu": [
    {
      "operator": "LE",
      "left_operand": "{{USER_GPU}}",
      "right_operand": "{{K8_GPU}}",
      "assert_message": "GPU {{USER_GPU}} exceeds node capacity {{K8_GPU}}"
    },
    {
      "operator": "GE",
      "left_operand": "{{USER_GPU}}",
      "right_operand": "0",
      "assert_message": "GPU {{USER_GPU}} cannot be negative"
    }
  ],
  "default_memory": [
    {
      "operator": "LT",
      "left_operand": "{{USER_MEMORY}}",
      "right_operand": "{{K8_MEMORY}}",
      "assert_message": "Memory {{USER_MEMORY}} exceeds node capacity {{K8_MEMORY}}"
    },
    {
      "operator": "GT",
      "left_operand": "{{USER_MEMORY}}",
      "right_operand": "0",
      "assert_message": "Memory {{USER_MEMORY}} must be greater than 0"
    }
  ],
  "default_storage": [
    {
      "operator": "LT",
      "left_operand": "{{USER_STORAGE}}",
      "right_operand": "{{K8_STORAGE}}",
      "assert_message": "Storage {{USER_STORAGE}} exceeds node capacity {{K8_STORAGE}}"
    },
    {
      "operator": "GT",
      "left_operand": "{{USER_STORAGE}}",
      "right_operand": "0",
      "assert_message": "Storage {{USER_STORAGE}} must be greater than 0"
    }
  ]
}
EOF

$ osmo config update RESOURCE_VALIDATION --file validation_config.json

Step 3: Reference in Pool Configuration

Add validation templates to your pool’s common_resource_validations field:

{
  "name": "my-pool",
  "backend": "default",
  "common_resource_validations": [
    "default_cpu",
    "default_memory",
    "default_storage",
    "default_gpu"
  ]
}

Additional Examples#

Troubleshooting#

Validation Always Fails

Check Kubernetes nodes are properly labeled and available
Verify node variables are populated: kubectl describe nodes

Inconsistent Results

Ensure all nodes report resources consistently
Check no nodes are in unschedulable state

Unit Conversion Errors

Use consistent units between requests and validation (For example, use Gi vs GB)
Review variable substitution in error messages

Debugging Tips

Start with simple rules and add complexity gradually
Test validation with different resource values
Examine OSMO service logs for detailed rule evaluation

Tip

Best Practices

Don’t allow 100% resource utilization - leave margins for system overhead and unexpected spikes
Use LT (less than) instead of LE (less/equal) for memory and storage to ensure safety margins
Write clear error messages that include variable values to help users fix issues quickly
Test rules with edge cases (minimum values, maximum values, invalid inputs)