Basic Guide: Wrap LLM Calls#

Use this guide when a framework, SDK, or provider adapter owns model invocation and you need NeMo Flow to observe and control those provider calls.

What You Build#

You will place a managed NeMo Flow LLM execution wrapper at the provider boundary. The wrapper emits LLM lifecycle events, runs LLM middleware, attaches the call to the active scope, records the model_name, and returns the provider response to the framework.

Before You Start#

You need:

A framework request or run scope. If the framework does not create one yet, start with Basic Guide: Adding Scopes.
A stable model-provider boundary, such as a provider adapter or client dispatch method.
A JSON-compatible request projection inside LLMRequest.
A JSON-compatible response projection for subscribers and exporters.

Integration Pattern#

Follow this sequence to keep framework work attached to the expected runtime context.

Enter or inherit the active framework scope.
Convert the framework provider payload into LLMRequest.
Route the real provider callback through the managed LLM execute helper.
Pass a stable provider name and model_name.
Keep provider clients, streams, callbacks, and retry state outside emitted JSON payloads.

Use a request or response codec when provider payloads need normalization before middleware or events see them. Use Advanced Guide: Provider Codecs for those cases.

Concrete LLM Example#

The examples below wrap one provider call and attach it to the active parent scope.

Python

from typing import TypedDict

import nemo_flow
from nemo_flow import LLMRequest


class LlmResponse(TypedDict):
    text: str
    request: object


async def framework_llm(provider_name: str, payload: object) -> LlmResponse:
    parent = nemo_flow.scope.get_handle()
    request = LLMRequest({}, payload)

    async def invoke(req: LLMRequest) -> LlmResponse:
        return {"text": "hi", "request": req.content}

    return await nemo_flow.llm.execute(
        provider_name,
        request,
        invoke,
        handle=parent,
        model_name="demo-model",
    )

Node.js

import { getHandle, LlmRequest, llmCallExecute, type ScopeHandle } from 'nemo-flow-node';

type LlmResponse = { text: string; request: unknown };

export async function frameworkLlm(providerName: string, payload: unknown): Promise<LlmResponse> {
  const parent: ScopeHandle = getHandle();
  const request = new LlmRequest({}, payload);

  return await llmCallExecute(
    providerName,
    request,
    async (req: LlmRequest): Promise<LlmResponse> => ({ text: 'hi', request: req.content }),
    parent,
    null,
    null,
    null,
    'demo-model',
  ) as Promise<LlmResponse>;
}

Rust

use nemo_flow::api::llm::{llm_call_execute, LlmCallExecuteParams, LlmRequest};
use nemo_flow::api::scope::get_handle;
use serde_json::json;
use std::sync::Arc;

async fn run_provider_call() -> anyhow::Result<serde_json::Value> {
    let parent = get_handle()?;
    let request = LlmRequest {
        headers: Default::default(),
        content: json!({"messages": [{"role": "user", "content": "hello"}]}),
    };

    let response = llm_call_execute(
        LlmCallExecuteParams::builder()
            .name("demo-provider")
            .request(request)
            .func(Arc::new(|req| Box::pin(async move {
                Ok(json!({"text": "hi", "request": req.content}))
            })))
            .parent(parent)
            .model_name("demo-model")
            .build(),
    )
    .await?;

    Ok(response)
}

Streaming Providers#

Use the LLM stream execute helper when the framework exposes a stream boundary that NeMo Flow can own. Stream wrappers preserve the same scope and middleware model while letting subscribers observe the completed response after chunks are collected.

If the framework owns the stream internally, emit explicit start and end lifecycle events around the provider stream and use mark events for retry, queue, and partial-output milestones.

Validate the LLM Wrapper#

Run one provider path and check:

The application receives the same provider response as before.
Subscribers see one LLM start event and one matching LLM end event.
The event includes the expected provider name and model_name.
LLM middleware runs exactly once.
Provider-owned clients, streams, and callbacks stay outside emitted JSON payloads.

Common Issues#

Check these symptoms first when the workflow does not behave as expected.

The LLM appears outside the request trace: Pass the active scope handle or run the provider call inside the framework request scope.
The model name is missing: Pass model_name from the provider payload, model client, or framework run configuration.
Request middleware receives provider objects: Convert provider payloads into LLMRequest with JSON-compatible content before calling NeMo Flow.
Stream output is incomplete: Use the stream execute helper when NeMo Flow owns the stream boundary, or emit explicit lifecycle events when it does not.

Next Steps#

Use these links to continue from this workflow into the next related task.

Add tool integration with Basic Guide: Wrap Tool Calls.
Normalize provider payloads with Advanced Guide: Provider Codecs.
Use Advanced Guide: Handle Non-Serializable Data for provider clients, streams, and callback objects.