Inference Profiles#

NemoClaw configures inference through the OpenShell gateway. The agent inside the sandbox talks to inference.local, and OpenShell routes that traffic to the provider you selected during onboarding.

Routed Provider Model#

NemoClaw keeps provider credentials on the host. The sandbox does not receive your raw OpenAI, Anthropic, Gemini, or NVIDIA API key.

At onboard time, NemoClaw configures:

  • an OpenShell provider

  • an OpenShell inference route

  • the baked OpenClaw model reference inside the sandbox

That means the sandbox knows which model family to use, while OpenShell owns the actual provider credential and upstream endpoint.

Supported Providers#

The following non-experimental provider paths are available through nemoclaw onboard.

Provider

Endpoint Type

Notes

NVIDIA Endpoints

OpenAI-compatible

Hosted models on integrate.api.nvidia.com

OpenAI

Native OpenAI-compatible

Uses OpenAI model IDs

Other OpenAI-compatible endpoint

Custom OpenAI-compatible

For compatible proxies and gateways

Anthropic

Native Anthropic

Uses anthropic-messages

Other Anthropic-compatible endpoint

Custom Anthropic-compatible

For Claude proxies and compatible gateways

Google Gemini

OpenAI-compatible

Uses Google’s OpenAI-compatible endpoint

Local Ollama

OpenAI-compatible

Local Ollama runtime routed through inference.local

Validation During Onboarding#

NemoClaw validates the selected provider and model before it creates the sandbox.

  • OpenAI-compatible providers: NemoClaw tries /responses first, then /chat/completions.

  • Anthropic-compatible providers: NemoClaw tries /v1/messages.

  • NVIDIA Endpoints manual model entry: NemoClaw also validates the model name against https://integrate.api.nvidia.com/v1/models.

  • Compatible endpoint flows: NemoClaw validates by sending a real inference request, because many proxies do not expose a reliable /models endpoint.

If validation fails, the wizard does not continue to sandbox creation.

Local Providers#

Some local providers use the same routed inference.local pattern, but the upstream runtime is local to the host.

  • Local Ollama

  • Local NVIDIA NIM

  • Local vLLM

Only Local NVIDIA NIM and Local vLLM are behind the NEMOCLAW_EXPERIMENTAL=1 gate.

Ollama gets additional onboarding help:

  • if no models are installed, NemoClaw offers starter models

  • it pulls the selected model

  • it warms the model

  • it validates the model before continuing

Runtime Switching#

For runtime switching guidance, refer to Switch Inference Models.