DynamoMocker#

DynamoMocker workload (test_template_name is DynamoMocker) runs GPU-free LLM inference simulation using dynamo.mocker and dynamo.frontend from the ai-dynamo package, then benchmarks the stack with aiperf or genai-perf.

It is a Standalone workload (no Slurm/Kubernetes/RunAI required).

Prerequisites#

CloudAI automatically installs ai-dynamo, aiperf, and genai-perf into a managed Python virtual environment on first run — no manual pip install is needed.

The one prerequisite is nats-server. On many clusters nats-server is pre-installed by administrators and is already on PATH. Check if it is already available:

which nats-server

If not found, download the binary from the official releases, extract it, and add it to PATH:

# replace <version> with the latest release tag, e.g. v2.10.24
curl -L https://github.com/nats-io/nats-server/releases/download/<version>/nats-server-<version>-linux-amd64.zip \
  -o /tmp/nats-server.zip
unzip /tmp/nats-server.zip -d /tmp/
mkdir -p ~/.local/bin && mv /tmp/nats-server-<version>-linux-amd64/nats-server ~/.local/bin/
export PATH="$HOME/.local/bin:$PATH"  # add to ~/.bashrc to persist

An HF_TOKEN environment variable is required to download gated models from HuggingFace Hub. Set it before running:

export HF_TOKEN=<your_token>

Topologies#

The workload supports two disaggregation modes, configured via cmd_args.worker.disaggregation_mode:

  • Combined (none): a single dynamo.mocker process handles both prefill and decode. Controlled by cmd_args.worker.num_workers.

  • Disaggregated (prefill_decode): separate prefill and decode mocker instances, mirroring the production ai_dynamo topology. Instance counts are set via cmd_args.worker.prefill_worker.num_nodes and cmd_args.worker.decode_worker.num_nodes.

Benchmark Tools#

Select the benchmark tool with cmd_args.benchmark_tool:

  • "aiperf" (default in the provided TOML) — uses the aiperf profiler

  • "genai_perf" — uses genai-perf profile

Parameters for the active tool are configured under [cmd_args.aiperf] or [cmd_args.genai_perf].

Run Using Standalone#

Note

Set HF_TOKEN before running to allow model download from HuggingFace Hub.

uv run cloudai run \
  --system-config conf/experimental/dynamo_mocker/system/standalone_system.toml \
  --tests-dir conf/experimental/dynamo_mocker/test \
  --test-scenario conf/experimental/dynamo_mocker/test_scenario/dynamo_mocker.toml

CloudAI will:

  1. Install ai-dynamo, aiperf, and genai-perf into a managed venv (first run only).

  2. Write a wrapper script and launch dynamo_mocker.sh.

  3. Start nats-server, dynamo.mocker (prefill and decode), and dynamo.frontend.

  4. Run the benchmark and write results to the output directory.

Review Benchmark Results#

After the run completes, results are placed in results/<scenario_name>/<test_id>/:

  • benchmark_report.csv — full per-request and aggregate metrics (throughput, latency percentiles, TTFT, ITL)

  • stdout.txt / stderr.txt — orchestration log and process output

  • dynamo_prefill_0.log, dynamo_decode_0.log, dynamo_frontend.log — per-component logs

  • nats.log — NATS server log

Key summary metrics from benchmark_report.csv:

Metric,Value
Output Token Throughput (tokens/sec),667.58
Request Count,50.00
Request Throughput (requests/sec),18.04

Metric,avg,p50,p99
Request Latency (ms),507.49,475.50,893.26
Time to First Token (ms),77.82,71.99,137.26
Inter Token Latency (ms),12.03,11.55,16.42

API Documentation#

Command Arguments#

class cloudai.workloads.dynamo_mocker.dynamo_mocker.DynamoMockerCmdArgs(
*,
model_path: str = 'Qwen/Qwen3-0.6B',
nats_cmd: str = 'nats-server -js',
engine: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerEngineArgs = <factory>,
worker: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerWorkerConfig = <factory>,
frontend: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerFrontendArgs = <factory>,
benchmark_tool: ~typing.Literal['genai_perf',
'aiperf'] = 'genai_perf',
genai_perf: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerGenAIPerfArgs = <factory>,
aiperf: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerAIPerfArgs = <factory>,
**extra_data: ~typing.Any,
)[source]#

Bases: CmdArgs

Top-level command arguments for the Dynamo Mocker workload.

Test Definition#

class cloudai.workloads.dynamo_mocker.dynamo_mocker.DynamoMockerTestDefinition(*, name: str, description: str, test_template_name: str, cmd_args: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.DynamoMockerCmdArgs, dse_excluded_args: list[str] = <factory>, extra_env_vars: dict[str, str | ~typing.List[str]] = {}, extra_cmd_args: dict[str, str] = {}, extra_container_mounts: list[str] = [], git_repos: list[~cloudai._core.installables.git_repo.GitRepo] = [], nsys: ~cloudai.models.workload.NsysConfiguration | None = None, predictor: ~cloudai.models.workload.PredictorConfig | None = None, agent: str = 'grid_search', agent_steps: int = 1, agent_metrics: list[str] = ['default'], agent_reward_function: str = 'inverse', agent_config: dict[str, ~typing.Any] | None = None, success_marker: str = 'success-marker.txt', failure_marker: str = 'failure-marker.txt')[source]#

Bases: TestDefinition

Test definition for the Dynamo Mocker workload.

Runs dynamo.mocker + dynamo.frontend as a lightweight GPU-free LLM inference simulator, then benchmarks it with genai-perf or aiperf via dynamo_mocker.sh. Select the benchmark tool with cmd_args.benchmark_tool (“genai_perf” or “aiperf”).

Supports two topologies mirroring ai_dynamo.sh:
  • Combined (worker.disaggregation_mode=none): single mocker handles prefill+decode

  • Disaggregated (worker.disaggregation_mode=prefill_decode): separate prefill and decode mocker instances with KV event publishing and simulated transfer

Requires: ai-dynamo and genai-perf or aiperf pip-installed in the active environment.

is_dse_excluded_arg(path: str) bool#

Return whether a dot-separated cmd_args path should be ignored by DSE.