Inference Profiles#

NemoClaw configures inference through the OpenShell gateway. The agent inside the sandbox talks to inference.local, and OpenShell routes that traffic to the provider you selected during onboarding.

Routed Provider Model#

NemoClaw keeps provider credentials on the host. The sandbox does not receive your raw OpenAI, Anthropic, Gemini, or NVIDIA API key.

At onboard time, NemoClaw configures:

  • an OpenShell provider

  • an OpenShell inference route

  • the baked OpenClaw model reference inside the sandbox

That means the sandbox knows which model family to use, while OpenShell owns the actual provider credential and upstream endpoint.

Supported Providers#

The following non-experimental provider paths are available through nemoclaw onboard.

Provider

Endpoint Type

Notes

NVIDIA Endpoints

OpenAI-compatible

Hosted models on integrate.api.nvidia.com

OpenAI

Native OpenAI-compatible

Uses OpenAI model IDs

Other OpenAI-compatible endpoint

Custom OpenAI-compatible

For compatible proxies and gateways

Anthropic

Native Anthropic

Uses anthropic-messages

Other Anthropic-compatible endpoint

Custom Anthropic-compatible

For Claude proxies and compatible gateways

Google Gemini

OpenAI-compatible

Uses Google’s OpenAI-compatible endpoint

Validation During Onboarding#

NemoClaw validates the selected provider and model before it creates the sandbox.

  • OpenAI-compatible providers: NemoClaw tries /responses first, then /chat/completions.

  • Anthropic-compatible providers: NemoClaw tries /v1/messages.

  • NVIDIA Endpoints manual model entry: NemoClaw also validates the model name against https://integrate.api.nvidia.com/v1/models.

  • Compatible endpoint flows: NemoClaw validates by sending a real inference request, because many proxies do not expose a reliable /models endpoint.

If validation fails, the wizard does not continue to sandbox creation.

Local Ollama#

Local Ollama is available in the standard onboarding flow when Ollama is installed or running on the host. It uses the same routed inference.local pattern, but the upstream runtime runs locally instead of in the cloud.

Ollama gets additional onboarding help:

  • if no models are installed, NemoClaw offers starter models

  • it pulls the selected model

  • it warms the model

  • it validates the model before continuing

Experimental Local Providers#

The following local providers require NEMOCLAW_EXPERIMENTAL=1:

  • Local NVIDIA NIM (requires a NIM-capable GPU)

  • Local vLLM (must already be running on localhost:8000)

Runtime Switching#

For runtime switching guidance, refer to Switch Inference Models.