Middleware#
This page explains the runtime behavior that runs around managed tool and LLM calls.
What Middleware Is#
Middleware is the runtime behavior that runs around tool and LLM execution. NeMo Flow uses middleware to control, transform, or observe work at specific lifecycle points.
Middleware is organized by lifecycle meaning rather than as one undifferentiated hook system.
Registration Levels#
Middleware and subscribers can be registered at different levels depending on their lifetime and visibility.
Global Registrations#
Global registrations stay active for the whole process until they are removed. Use them for defaults that should apply broadly.
Scope-Local Registrations#
Scope-local registrations are owned by one active scope and disappear automatically when that scope closes.
Use them when behavior should stay local to one request, workflow, or nested unit of work.
Plugin-Installed Registrations#
Plugins can install middleware during initialization. This is the reusable, configuration-driven path for shipping middleware bundles without hand-registering everything in application code.
Middleware Families#
NeMo Flow has two major middleware families:
Intercepts change the real execution path
Guardrails block work or rewrite emitted observability payloads
Intercepts#
Intercepts are middleware that change the real request or execution path.
Request Intercepts#
Request intercepts rewrite the real request before execution continues.
Use them when the next stage of execution should receive changed input, such as:
Header injection
Request normalization
Argument enrichment
Provider-specific request rewriting
Execution Intercepts#
Execution intercepts wrap or replace the real callback.
Use them when behavior belongs around the invocation boundary itself, such as:
Retries
Timing
Routing
Wrapper logic
Framework integration
Stream Execution Intercepts#
LLM streaming has a stream execution path for wrappers that need to run around chunk delivery and finalization rather than only around a single response object.
Guardrails#
Guardrails are middleware that block execution or sanitize observability payloads.
Conditional Execution#
Conditional-execution guardrails run before the real callback. They decide whether execution may proceed.
Use them when the runtime should block work based on policy, budget, or context.
Sanitize Request#
Sanitize-request guardrails rewrite the payload recorded on emitted start events.
Use them when the event stream should hide or reduce sensitive request data.
Sanitize Response#
Sanitize-response guardrails rewrite the payload recorded on emitted end events.
Use them when the event stream should hide or reduce sensitive response data.
Sanitize guardrails are observability-oriented. They do not rewrite the real arguments passed to the callback or the real value returned to the caller.
Managed Execution Order#
For managed execution, NeMo Flow applies middleware in this order:
sequenceDiagram
autonumber
actor Caller as Application / Framework
participant Runtime as NeMo Flow Runtime
participant Cond as Conditional Guardrails
participant Req as Request Intercepts
participant Exec as Execution Intercepts
participant Callback as Real Callback
participant San as Sanitize Guardrails
participant Subs as Subscribers
Caller->>Runtime: managed tool or LLM call
Runtime->>Cond: decide whether work may proceed
alt blocked
Cond-->>Caller: reject execution
else allowed
Runtime->>Req: rewrite the real request
Runtime->>San: sanitize emitted start payload
Runtime->>Subs: emit start event
Runtime->>Exec: wrap execution
Exec->>Callback: invoke callback
Callback-->>Exec: return real result
Exec-->>Runtime: continue
Runtime->>San: sanitize emitted end payload
Runtime->>Subs: emit end event
Runtime-->>Caller: return real result
end
Conditional-execution guardrails
Request intercepts
Sanitize-request guardrails for emitted start events
Execution intercepts
The real callback
Sanitize-response guardrails for emitted end events
For streaming LLM flows, stream execution intercepts sit inside the
execution path between items 4 and 6. sanitize-request guardrails still apply
at item 3 to the emitted start payload, execution intercepts still wrap the
call boundary at item 4, and stream execution intercepts then run on emitted
streaming start/chunk/end activity before sanitize-response guardrails rewrite
the emitted response-side payloads at item 6.
This ordering is what makes the semantic split between intercepts and guardrails important:
If you need to change the real execution path, use an intercept
If you need to change only the emitted payload, use a sanitize guardrail
Detailed Execution Flow#
The simplified sequence above is the right mental model for most readers. The diagram below expands the same flow to show where guardrail rejections, event subscribers, execution-intercept chaining, and streaming collection/finalization fit into the runtime path.
flowchart TB
Request([Request])
subgraph Execution
direction TB
ConditionalExecutionGuardrails{{Conditional-Execution Guardrail}}
RequestIntercepts[/Request Intercepts/]
RaiseException[Raise Exception]
subgraph Invocation
direction TB
HasExecutionIntercept{{Has Valid Execution Intercept}}
ExecutionIntercepts[/Execution Intercept/]
DefaultCallable[Default Callable]
end
subgraph Streaming
direction TB
Finalizer[Finalizer]
Collector[Collector]
end
subgraph Observability
direction TB
SanitizeRequestGuardrails[/Sanitize Request Guardrail/]
SanitizeResponseGuardrails[/Sanitize Response Guardrail/]
EventSubscribers[["Event Subscribers"]]
end
end
Response([Response])
Request --> ConditionalExecutionGuardrails
RequestIntercepts -->|Transformed Request| SanitizeRequestGuardrails & Invocation
ConditionalExecutionGuardrails -->|"(rejected)"| EventSubscribers
ConditionalExecutionGuardrails -->|"(rejected)"| RaiseException
ConditionalExecutionGuardrails -->|"(passed)"| RequestIntercepts
HasExecutionIntercept -->|No| DefaultCallable
HasExecutionIntercept -->|Yes| ExecutionIntercepts
ExecutionIntercepts -.->|chain=yes| HasExecutionIntercept
ExecutionIntercepts -.->|chain=no| DefaultCallable
Invocation -->|Response| SanitizeResponseGuardrails
Invocation -->|Response| Response
Invocation -.->|stream chunks| Collector
Collector -..->|stream chunks| Response
Invocation -.->|"(stream ends)"| Finalizer
Finalizer -.->|Aggregated Response| SanitizeResponseGuardrails
Finalizer o--o|shared state| Collector
SanitizeRequestGuardrails -->|Sanitized Request| EventSubscribers
SanitizeResponseGuardrails -->|Sanitized Response| EventSubscribers
class Execution,Invocation,Streaming,Observability,Request,Response grey-lightest;
class EventSubscribers teal-lightest;
class RequestIntercepts,HasExecutionIntercept,ExecutionIntercepts yellow-lightest;
class ConditionalExecutionGuardrails,SanitizeRequestGuardrails,SanitizeResponseGuardrails green-lightest;
class RaiseException red-lightest;
class DefaultCallable,Collector,Finalizer magenta-lightest;
Choosing the Right Surface#
Use these comparisons to pick the middleware surface that matches the behavior you need.
Use a conditional-execution guardrail when the work should be allowed or rejected.
Use a request intercept when the real request must change before the call.
Use an execution intercept when behavior belongs around the invocation boundary.
Use a sanitize guardrail when only subscribers and exporters should see rewritten data.
Use a stream execution intercept when you need streaming-specific behavior applied across the lifecycle of a long-lived or chunked response, such as per-chunk transformation, incremental authorization, logging or metrics per event, backpressure handling, or cancellation and cleanup, rather than an execution intercept that only surrounds a single call boundary.
Practical Guidance#
Use these practices when applying the concept in application or integration code.
Keep process-wide defaults global.
Keep request-local policy scope-local.
Use plugins when the middleware bundle should be reusable and configuration-driven.
Treat execution intercepts as the preferred wrapper point for framework integrations.