DynamoMocker#

DynamoMocker workload (test_template_name is DynamoMocker) runs GPU-free LLM inference simulation using dynamo.mocker and dynamo.frontend from the ai-dynamo package, then benchmarks the stack with aiperf or genai-perf.

It is a Standalone workload (no Slurm/Kubernetes/RunAI required).

Prerequisites#

CloudAI automatically installs ai-dynamo, aiperf, and genai-perf into a managed Python virtual environment on first run — no manual pip install is needed.

The one prerequisite is nats-server. On many clusters nats-server is pre-installed by administrators and is already on PATH. Check if it is already available:

which nats-server

If not found, download the binary from the official releases, extract it, and add it to PATH:

# replace <version> with the latest release tag, e.g. v2.10.24
curl -L https://github.com/nats-io/nats-server/releases/download/<version>/nats-server-<version>-linux-amd64.zip \
  -o /tmp/nats-server.zip
unzip /tmp/nats-server.zip -d /tmp/
mkdir -p ~/.local/bin && mv /tmp/nats-server-<version>-linux-amd64/nats-server ~/.local/bin/
export PATH="$HOME/.local/bin:$PATH"  # add to ~/.bashrc to persist

An HF_TOKEN environment variable is required to download gated models from HuggingFace Hub. Set it before running:

export HF_TOKEN=<your_token>

Topologies#

The workload supports two disaggregation modes, configured via cmd_args.worker.disaggregation_mode:

Combined (none): a single dynamo.mocker process handles both prefill and decode. Controlled by cmd_args.worker.num_workers.
Disaggregated (prefill_decode): separate prefill and decode mocker instances, mirroring the production ai_dynamo topology. Instance counts are set via cmd_args.worker.prefill_worker.num_nodes and cmd_args.worker.decode_worker.num_nodes.

Benchmark Tools#

Select the benchmark tool with cmd_args.benchmark_tool:

"aiperf" (default in the provided TOML) — uses the aiperf profiler
"genai_perf" — uses genai-perf profile

Parameters for the active tool are configured under [cmd_args.aiperf] or [cmd_args.genai_perf].

Run Using Standalone#

Note

Set HF_TOKEN before running to allow model download from HuggingFace Hub.

uv run cloudai run \
  --system-config conf/experimental/dynamo_mocker/system/standalone_system.toml \
  --tests-dir conf/experimental/dynamo_mocker/test \
  --test-scenario conf/experimental/dynamo_mocker/test_scenario/dynamo_mocker.toml

CloudAI will:

Install ai-dynamo, aiperf, and genai-perf into a managed venv (first run only).
Write a wrapper script and launch dynamo_mocker.sh.
Start nats-server, dynamo.mocker (prefill and decode), and dynamo.frontend.
Run the benchmark and write results to the output directory.

Review Benchmark Results#

After the run completes, results are placed in results/<scenario_name>/<test_id>/:

benchmark_report.csv — full per-request and aggregate metrics (throughput, latency percentiles, TTFT, ITL)
stdout.txt / stderr.txt — orchestration log and process output
dynamo_prefill_0.log, dynamo_decode_0.log, dynamo_frontend.log — per-component logs
nats.log — NATS server log

Key summary metrics from benchmark_report.csv:

Metric,Value
Output Token Throughput (tokens/sec),667.58
Request Count,50.00
Request Throughput (requests/sec),18.04

Metric,avg,p50,p99
Request Latency (ms),507.49,475.50,893.26
Time to First Token (ms),77.82,71.99,137.26
Inter Token Latency (ms),12.03,11.55,16.42

API Documentation#

Command Arguments#

class cloudai.workloads.dynamo_mocker.dynamo_mocker.DynamoMockerCmdArgs(

*,

model_path: str = 'Qwen/Qwen3-0.6B',

nats_cmd: str = 'nats-server -js',

engine: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerEngineArgs = <factory>,

worker: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerWorkerConfig = <factory>,

frontend: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerFrontendArgs = <factory>,

benchmark_tool: ~typing.Literal['genai_perf',

'aiperf'] = 'genai_perf',

genai_perf: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerGenAIPerfArgs = <factory>,

aiperf: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerAIPerfArgs = <factory>,

**extra_data: ~typing.Any,

)[source]#

Bases: CmdArgs

Top-level command arguments for the Dynamo Mocker workload.

Test Definition#

class cloudai.workloads.dynamo_mocker.dynamo_mocker.DynamoMockerTestDefinition( *, name: str, description: str, test_template_name: str, cmd_args: DynamoMockerCmdArgs, extra_env_vars: dict[str, str | List[str]] = {}, extra_cmd_args: dict[str, str] = {}, extra_container_mounts: list[str] = [], git_repos: list[GitRepo] = [], nsys: NsysConfiguration | None = None, predictor: PredictorConfig | None = None, agent: str = 'grid_search', agent_steps: int = 1, agent_metrics: list[str] = ['default'], agent_reward_function: str = 'inverse', agent_config: dict[str, Any] | None = None, success_marker: str = 'success-marker.txt', failure_marker: str = 'failure-marker.txt', )[source]#

Bases: TestDefinition

Test definition for the Dynamo Mocker workload.

Runs dynamo.mocker + dynamo.frontend as a lightweight GPU-free LLM inference simulator, then benchmarks it with genai-perf or aiperf via dynamo_mocker.sh. Select the benchmark tool with cmd_args.benchmark_tool (“genai_perf” or “aiperf”).

Supports two topologies mirroring ai_dynamo.sh:

Combined (worker.disaggregation_mode=none): single mocker handles prefill+decode
Disaggregated (worker.disaggregation_mode=prefill_decode): separate prefill and decode mocker instances with KV event publishing and simulated transfer

Requires: ai-dynamo and genai-perf or aiperf pip-installed in the active environment.