Cloud Provider#

Create a managed Kubernetes cluster in the cloud to be used as a backend for job execution. This guide provides links to setup instructions for various cloud providers.

Prerequisites#

You will need access to a cloud provider account (AWS, Azure, or GCP) with permissions to create and manage Kubernetes clusters.

Important

  • Kubernetes Version: v1.30.0 or later

  • Networking: Ensure the nodes in the cluster have outbound internet access to container registries and access to the OSMO service

Setup Guide#

Cloud Provider

Documentation

Recommended Instance Types

Amazon Web Services (EKS)

  • Backend-Operator: m5.xlarge (4 vCPUs, 8 GB RAM, auto-scaling: 1-3 nodes)

  • Compute (CPU): m5.2xlarge

  • Compute (GPU): p3.2xlarge, p4d.24xlarge (A100), p5.48xlarge (H100) - GPU instances

Microsoft Azure (AKS)

  • Backend-Operator: Standard_D4s_v3 (4 vCPUs, 8 GB RAM, auto-scaling: 1-3 nodes)

  • Compute (CPU): Standard_D8s_v3

  • Compute (GPU): Standard_NC6s_v3, Standard_ND96asr_v4 (A100), Standard_ND96isr_H100_v5 (H100) - GPU VMs

Google Cloud Platform (GKE)

  • Backend-Operator: n1-standard-4 (4 vCPUs, 8 GB RAM, auto-scaling: 1-3 nodes)

  • Compute (CPU): n1-standard-8

  • Compute (GPU): n1-standard-4 with T4/A100, a2-highgpu-1g (A100), a3-highgpu-8g (H100) - GPUs on Compute Engine

Note

Configure auto-scaling for compute nodes based on your expected workload patterns.