Scheduler Configuration#

After configuring pools, you can enable advanced scheduling features using the KAI scheduler . This configuration controls how workflows compete for resources, enabling co-scheduling, preemption, and fair sharing across teams.

Why Use KAI Scheduler?#

The KAI scheduler provides enterprise-grade resource management capabilities:

✓ Co-Scheduling: Schedule multiple tasks together for distributed training, hardware-in-the-loop simulations, and parallel synthetic data generation.
✓ Priority & Preemption: High-priority workflows can preempt low-priority ones, ensuring critical work proceeds even when clusters are fully utilized.
✓ Fair Resource Sharing: Guarantee minimum resources per pool while allowing teams to burst above their baseline when capacity is available.
✓ Maximize Utilization: Reclaim idle resources and redistribute them across pools based on configurable weights, minimizing waste.

How It Works#

GPU Allocation Model#

Guarantee 🔒

Minimum resources

Reserved, cannot be preempted

Weight ⚖️

Fair share ratio

Proportional allocation above guarantee

Maximum 🚧

Upper limit

Cap total pool usage (-1 = unlimited)

Key Concepts#

Guarantee: Minimum GPUs/resources reserved for a pool (non-preemptible workflows)
Weight: Proportional share when pools exceed their guarantee (e.g., 1:3 ratio)
Maximum: Hard cap on total resources a pool can use (-1 means unlimited)
Preemptible Workflows: Use LOW priority; can be stopped to free resources
Non-Preemptible Workflows: Use HIGH/NORMAL priority; protected from preemption

Note

For detailed configuration fields, see Resource Constraint in the API reference.

Warning

To enable preemption, ALL pools sharing the same nodes must configure guarantee, weight, and maximum. Partial configuration disables preemption.

Practical Guide#

GPU Allocation#

Example Cluster: Assume a cluster with 100 GPUs total divided into two pools: Training (A) and Simulation (B).

Pool	Guarantee	Weight	Maximum
Training (A)	30 GPUs	1	70 GPUs
Simulation (B)	50 GPUs	3	Unlimited (-1)

Basic Allocation Behavior:

Pool A gets 30 GPUs guaranteed (non-preemptible workflows)
Pool B gets 50 GPUs guaranteed (non-preemptible workflows)
Pool A can burst up to 70 GPUs total (including preemptible)
Pool B can use unlimited GPUs (including preemptible)

Warning

When both pools exceed guarantees, Pool B gets 3x Pool A’s allocation (weight ratio 1:3)

Weight Ratio Example:

When 20 GPUs become available and both pools want more:

Pool A gets 5 GPUs (1 part)
Pool B gets 15 GPUs (3 parts)

Preemption Scenarios#

Troubleshooting#

Preemption Not Working

Verify ALL pools have guarantee, weight, and maximum configured
Check pools share the same compute nodes
Ensure workflows use correct priority levels (HIGH/NORMAL/LOW)

Unfair Resource Distribution

Review weight ratios across pools
Verify guarantee values don’t exceed cluster capacity
Check if pools are hitting their maximum limits

Workflows Stuck in Pending

Confirm total guarantees don’t exceed cluster capacity
Check if pool has reached its maximum limit
Verify preemptible workflows are marked with LOW priority

Tip

Best Practices

Set guarantees to cover baseline workload for each team
Use weights to reflect team priorities (higher weight = more burst capacity)
Set reasonable maximums to prevent one team from monopolizing resources
Mark exploratory/dev work as LOW priority (preemptible)
Reserve HIGH/NORMAL priority for production workloads
Monitor pool utilization and adjust settings quarterly