Run Local Inference with Ollama#

This tutorial covers two ways to use Ollama with OpenShell:

  1. Ollama sandbox (recommended) — a self-contained sandbox with Ollama, Claude Code, and Codex pre-installed. One command to start.

  2. Host-level Ollama — run Ollama on the gateway host and route sandbox inference to it. Useful when you want a single Ollama instance shared across multiple sandboxes.

After completing this tutorial, you will know how to:

  • Launch the Ollama community sandbox for a batteries-included experience.

  • Use ollama launch to start coding agents inside a sandbox.

  • Expose a host-level Ollama server to sandboxes through inference.local.

Prerequisites#

  • A working OpenShell installation. Complete the Quickstart before proceeding.

Option B: Host-Level Ollama#

Use this approach when you want a single Ollama instance on the gateway host, shared across multiple sandboxes through inference.local.

Note

This approach uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name.

Step 1: Install and Start Ollama#

Install Ollama on the gateway host:

$ curl -fsSL https://ollama.com/install.sh | sh

Start Ollama on all interfaces so it is reachable from sandboxes:

$ OLLAMA_HOST=0.0.0.0:11434 ollama serve

Tip

If you see Error: listen tcp 0.0.0.0:11434: bind: address already in use, Ollama is already running as a system service. Stop it first:

$ systemctl stop ollama
$ OLLAMA_HOST=0.0.0.0:11434 ollama serve

Step 2: Pull a Model#

In a second terminal, pull a model:

$ ollama run qwen3.5:0.8b

Type /bye to exit the interactive session. The model stays loaded.

Step 3: Create a Provider#

Create an OpenAI-compatible provider pointing at the host Ollama:

$ openshell provider create \
    --name ollama \
    --type openai \
    --credential OPENAI_API_KEY=empty \
    --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1

OpenShell injects host.openshell.internal so sandboxes and the gateway can reach the host machine. You can also use the host’s LAN IP.

Step 4: Set Inference Routing#

$ openshell inference set --provider ollama --model qwen3.5:0.8b

Confirm:

$ openshell inference get

Step 5: Verify from a Sandbox#

$ openshell sandbox create -- \
    curl https://inference.local/v1/chat/completions \
    --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'

The response should be JSON from the model.

Troubleshooting#

Common issues and fixes:

  • Ollama not reachable from sandbox — Ollama must be bound to 0.0.0.0, not 127.0.0.1. This applies to host-level Ollama only; the community sandbox handles this automatically.

  • OPENAI_BASE_URL wrong — Use http://host.openshell.internal:11434/v1, not localhost or 127.0.0.1.

  • Model not found — Run ollama ps to confirm the model is loaded. Run ollama pull <model> if needed.

  • HTTPS vs HTTP — Code inside sandboxes must call https://inference.local, not http://.

  • AMD GPU driver issues — Ollama v0.18+ requires ROCm 7 drivers for AMD GPUs. Update your drivers if you see GPU detection failures.

Useful commands:

$ openshell status
$ openshell inference get
$ openshell provider get ollama

Next Steps#