Take NVIDIA Dynamo — start etcd & NATS, launch a frontend, spin up workers, service is up. That logical flow never changes.
But making it run on Slurm, Docker Compose, or Kubernetes requires platform-specific scripts, networking, and resource management — repeated for every new platform.
Portable YAML — tasks, deps, resources, launch methods
Slurm now. Docker & K8S planned.
Probes, artifacts, replicas — no platform coupling.
Readiness & failure gates — TCP, HTTP, log watch
Parallel / sequential, Cartesian sweeps
Jinja2 ${{}} — variables, backends
Named URIs: fs://, file://, http://
Rich terminal — task status, logs
sbatch, CSV-driven bulk sweeps
| Benefit | Description |
|---|---|
| Reuse | Shared components written once, used by all variants |
| Swap | Change benchmark by swapping benchmark_aiperf for benchmark_infmax |
| Mix frameworks | sglang + aiperf, vllm + infmax, trtllm + aiperf, etc. |
| Smaller diffs | Changes to one component only touch one file |
| Bulk testing | CSV combinations, generate/submit all at once |
| Computed vars | GPUS_PER_WORKER chains across modules |
Diagnosis — sflow orchestration vs application error
| Layer | Status | Detail |
|---|---|---|
| sflow | ✓ OK | DAG executed, all tasks launched, probes passed |
| GPU alloc | ✓ OK | 8 GPUs/node, no overlap, TP=8 per server |
| Infra | ✓ OK | etcd, NATS, frontend all READY |
| Routing | ✗ FAIL | Frontend gets service names, not host:port |
| Benchmark | ✗ FAIL | 0/800 requests succeed (all HTTP 500) |
| Command | Purpose | Key Flags |
|---|---|---|
sflow run | Execute a workflow | --dry-run --tui --set |
sflow batch | Generate sbatch scripts | --submit --bulk-input |
sflow compose | Merge multiple YAMLs | --resolve --validate |
sflow visualize | Render DAG image | --format png/svg/mermaid |
sflow sample | List / copy examples | --list -o |
sflow skill | Export AI agent skills | --list -o |
sflow ships with AI agent skills that teach coding assistants (Cursor, Copilot) how to write sflow YAML and debug errors — no training required.
What the agent learns from AGENTS.md
| Capability | Detail |
|---|---|
| Write YAML | Schema-aware config authoring with validation |
| Debug errors | Categorize errors, read logs, suggest fixes |
| GPU planning | TP/DP/PP sizing, node allocation math |
| Modular compose | Split configs, missable tasks, CSV sweeps |
| Step-by-step | Hardcode first → validate → parameterize |
As a top-level orchestration abstraction, sflow is uniquely positioned to unify heterogeneous compute — allocating the right accelerator from the right resource pool for each stage of a disaggregated pipeline.
Example: PD disaggregation — one sflow.yaml routes prefill to Vera Rubin GPU pools for maximum throughput, and decode to LPU accelerators for low-latency token generation. Each task lands on the optimal hardware automatically.
Slurm lacks a workflow orchestration layer — that’s where sflow starts. Docker and K8S backends are planned, leveraging native ecosystems (Helm charts, Argo Workflows).
$ uv pip install "sflow @ git+https://github.com/NVIDIA/nv-sflow.git@main"