sflow — Declarative Workflow Descriptor

02 / 12

The Problem

Same workflow. Different infra. Rewrite everything.

Take NVIDIA Dynamo — start etcd & NATS, launch a frontend, spin up workers, service is up. That logical flow never changes.

But making it run on Slurm, Docker Compose, or Kubernetes requires platform-specific scripts, networking, and resource management — repeated for every new platform.

03 / 12

The Solution

Separate what to deploy from where

✎

Describe Once

Portable YAML — tasks, deps, resources, launch methods

⇄

Swappable Backends

Slurm now. Docker & K8S planned.

⚙

Pluggable Plugins

Probes, artifacts, replicas — no platform coupling.

04 / 12

DAG Orchestration

Workflow DAG

load_image

▼

install_dependency

gpu_monitor

▼

nats_server

etcd_server

▼

frontend_server_0

frontend_server_1

frontend_server_2

▼

nginx_server

prefill_server_0

prefill_server_1

prefill_server_2

prefill_server_3

decode_server_0

▼

benchmark_infmax_16

05 / 12

Resource Planning

Topology-aware GPU Allocation

Allocation map (finalized node/GPU assignment): - backend 'slurm_cluster': ├─ node slurm_cluster-node0 │ GPU 0: prefill_server_0 │ GPU 1: prefill_server_1 │ GPU 2: prefill_server_2 │ GPU 3: prefill_server_3 │ Tasks: load_image, gpu_monitor, nats_server, etcd_server, frontend_server_0, ... ├─ node slurm_cluster-node1 │ GPU 0: decode_server_0 │ GPU 1: decode_server_0 │ GPU 2: decode_server_0 │ GPU 3: decode_server_0 │ Tasks: load_image, gpu_monitor, frontend_server_1, decode_server_0

06 / 12

Core Features

♥

Probes

Readiness & failure gates — TCP, HTTP, log watch

⇶

Replicas

Parallel / sequential, Cartesian sweeps

{{}}

Expressions

Jinja2 ${{}} — variables, backends

📦

Artifacts

Named URIs: fs://, file://, http://

⌨

Live TUI

Rich terminal — task status, logs

📋

Batch Mode

sbatch, CSV-driven bulk sweeps

07 / 12

Modular Composition

Split. Reuse. Swap.

Input file composition (5 files → merged workflow): ├─ inference_x_v2/slurm_config.yaml │ variables: [SLURM_ACCOUNT, SLURM_PARTITION, SLURM_TIMELIMIT, ...] │ backends: [slurm_cluster] ├─ inference_x_v2/common_workflow.yaml │ variables: [SERVED_MODEL_NAME, MODEL_NAME, ... (+7)] │ artifacts: [LOCAL_MODEL_PATH] │ operators: [dynamo, nginx] │ workflow.tasks: [load_image, install_dependency, gpu_monitor, ...] ├─ vllm/prefill.yaml │ variables: [NUM_CTX_SERVERS, CTX_TP_SIZE, ... (+4)] │ workflow.tasks: [prefill_server] ├─ vllm/decode.yaml │ variables: [NUM_GEN_SERVERS, GEN_TP_SIZE, ... (+4)] │ workflow.tasks: [decode_server] └─ benchmark_infmax.yaml → dynamo_benchmark variables: [ISL, OSL, CONCURRENCY, ... (+1)] workflow.tasks: [benchmark_infmax]

Benefit	Description
Reuse	Shared components written once, used by all variants
Swap	Change benchmark by swapping `benchmark_aiperf` for `benchmark_infmax`
Mix frameworks	sglang + aiperf, vllm + infmax, trtllm + aiperf, etc.
Smaller diffs	Changes to one component only touch one file
Bulk testing	CSV combinations, generate/submit all at once
Computed vars	`GPUS_PER_WORKER` chains across modules

08 / 12

Real-world Debugging

Structured Error Analysis

Workflow: b200-fp8-low-latency-tep8-1p-1d Model: DeepSeek R1 FP8 | 2 nodes × 8 GPUs | ISL=8192, OSL=1024 Allocation Map ├─ slurm-node-01 (node 0) │ GPU 0-7: prefill_server_0 (TP=8) │ Also: load_image, nats, etcd, frontend, benchmark_* └─ slurm-node-02 (node 1) GPU 0-7: decode_server_0 (TP=8) Also: load_image, gpu_monitor Timeline 01:57:08 — load_image + install_aiperf submitted 01:59:10 — load_image COMPLETED on both nodes 01:59:43 — nats_server READY 01:59:45 — etcd_server READY 02:00:31 — frontend_server_0 READY (10.52.32.8) 02:05:14 — prefill + decode READY → benchmark_4 starts 02:05:20 — HTTP 500 — all benchmark requests fail 02:05:41 — Workflow finished (8m 33s) Error from frontend logs: Invalid TCP address 'dynamo_prefill.generate-58b49ce145f56609' Invalid TCP address 'dynamo_backend.generate-58b49ce145f5660b'

Diagnosis — sflow orchestration vs application error

Layer	Status	Detail
sflow	✓ OK	DAG executed, all tasks launched, probes passed
GPU alloc	✓ OK	8 GPUs/node, no overlap, TP=8 per server
Infra	✓ OK	etcd, NATS, frontend all READY
Routing	✗ FAIL	Frontend gets service names, not host:port
Benchmark	✗ FAIL	0/800 requests succeed (all HTTP 500)

Root Cause Frontend receives discovery service names (e.g. dynamo_prefill.generate-58b49...) instead of host:port addresses. NATS/etcd service registry returns internal identifiers that the TCP router can't parse. Fix: Verify DYN_REQUEST_PLANE and frontend networking config match SGLang disagg routing.

09 / 12

CLI at a Glance

Command	Purpose	Key Flags
`sflow run`	Execute a workflow	`--dry-run` `--tui` `--set`
`sflow batch`	Generate sbatch scripts	`--submit` `--bulk-input`
`sflow compose`	Merge multiple YAMLs	`--resolve` `--validate`
`sflow visualize`	Render DAG image	`--format png/svg/mermaid`
`sflow sample`	List / copy examples	`--list` `-o`
`sflow skill`	Export AI agent skills	`--list` `-o`

10 / 12

AI-Native

Built-in Agent Skills

sflow ships with AI agent skills that teach coding assistants (Cursor, Copilot) how to write sflow YAML and debug errors — no training required.

$ sflow skill --list Available AI agent skills: - writing-sflow-yaml Schema, examples, validation - sflow-error-analysis Error catalog, diagnostics + AGENTS.md (agent workflow guidelines) $ sflow skill -o .cursor/skills Skills will be copied to: .cursor/skills/ Note: directory exists — files will be merged. ✓ Skills copied to: .cursor/skills/

What the agent learns from AGENTS.md

Capability	Detail
Write YAML	Schema-aware config authoring with validation
Debug errors	Categorize errors, read logs, suggest fixes
GPU planning	TP/DP/PP sizing, node allocation math
Modular compose	Split configs, missable tasks, CSV sweeps
Step-by-step	Hardcode first → validate → parameterize

AGENTS.md workflow 1. Gather cluster info (account, partition, GPUs) 2. Write minimal plain-text config 3. sflow run --dry-run to validate 4. Run and debug with --tui 5. Extract variables, parameterize 6. Modularize for multi-framework support

11 / 12

Vision

Heterogeneous Computing

As a top-level orchestration abstraction, sflow is uniquely positioned to unify heterogeneous compute — allocating the right accelerator from the right resource pool for each stage of a disaggregated pipeline.

Example: PD disaggregation — one sflow.yaml routes prefill to Vera Rubin GPU pools for maximum throughput, and decode to LPU accelerators for low-latency token generation. Each task lands on the optimal hardware automatically.

12 / 12

What’s Next

Slurm today. Docker & Kubernetes tomorrow.

Slurm lacks a workflow orchestration layer — that’s where sflow starts. Docker and K8S backends are planned, leveraging native ecosystems (Helm charts, Argo Workflows).

Slurm ✓ Docker (planned) Kubernetes (planned)

$ uv pip install "sflow @ git+https://github.com/NVIDIA/nv-sflow.git@main"

github.com/NVIDIA/nv-sflow

NV-sflow

Declarative Workflow Descriptor