DynamoMocker#
DynamoMocker workload (test_template_name is DynamoMocker) runs GPU-free LLM inference simulation
using dynamo.mocker and dynamo.frontend from the ai-dynamo
package, then benchmarks the stack with aiperf or
genai-perf.
It is a Standalone workload (no Slurm/Kubernetes/RunAI required).
Prerequisites#
CloudAI automatically installs ai-dynamo, aiperf, and genai-perf into a managed Python virtual
environment on first run — no manual pip install is needed.
The one prerequisite is nats-server. On many clusters nats-server is pre-installed by administrators
and is already on PATH. Check if it is already available:
which nats-server
If not found, download the binary from the official releases,
extract it, and add it to PATH:
# replace <version> with the latest release tag, e.g. v2.10.24
curl -L https://github.com/nats-io/nats-server/releases/download/<version>/nats-server-<version>-linux-amd64.zip \
-o /tmp/nats-server.zip
unzip /tmp/nats-server.zip -d /tmp/
mkdir -p ~/.local/bin && mv /tmp/nats-server-<version>-linux-amd64/nats-server ~/.local/bin/
export PATH="$HOME/.local/bin:$PATH" # add to ~/.bashrc to persist
An HF_TOKEN environment variable is required to download gated models from HuggingFace Hub. Set it before
running:
export HF_TOKEN=<your_token>
Topologies#
The workload supports two disaggregation modes, configured via cmd_args.worker.disaggregation_mode:
Combined (
none): a singledynamo.mockerprocess handles both prefill and decode. Controlled bycmd_args.worker.num_workers.Disaggregated (
prefill_decode): separate prefill and decode mocker instances, mirroring the productionai_dynamotopology. Instance counts are set viacmd_args.worker.prefill_worker.num_nodesandcmd_args.worker.decode_worker.num_nodes.
Benchmark Tools#
Select the benchmark tool with cmd_args.benchmark_tool:
"aiperf"(default in the provided TOML) — uses theaiperfprofiler"genai_perf"— usesgenai-perf profile
Parameters for the active tool are configured under [cmd_args.aiperf] or [cmd_args.genai_perf].
Run Using Standalone#
Note
Set HF_TOKEN before running to allow model download from HuggingFace Hub.
uv run cloudai run \
--system-config conf/experimental/dynamo_mocker/system/standalone_system.toml \
--tests-dir conf/experimental/dynamo_mocker/test \
--test-scenario conf/experimental/dynamo_mocker/test_scenario/dynamo_mocker.toml
CloudAI will:
Install
ai-dynamo,aiperf, andgenai-perfinto a managed venv (first run only).Write a wrapper script and launch
dynamo_mocker.sh.Start
nats-server,dynamo.mocker(prefill and decode), anddynamo.frontend.Run the benchmark and write results to the output directory.
Review Benchmark Results#
After the run completes, results are placed in results/<scenario_name>/<test_id>/:
benchmark_report.csv— full per-request and aggregate metrics (throughput, latency percentiles, TTFT, ITL)stdout.txt/stderr.txt— orchestration log and process outputdynamo_prefill_0.log,dynamo_decode_0.log,dynamo_frontend.log— per-component logsnats.log— NATS server log
Key summary metrics from benchmark_report.csv:
Metric,Value
Output Token Throughput (tokens/sec),667.58
Request Count,50.00
Request Throughput (requests/sec),18.04
Metric,avg,p50,p99
Request Latency (ms),507.49,475.50,893.26
Time to First Token (ms),77.82,71.99,137.26
Inter Token Latency (ms),12.03,11.55,16.42
API Documentation#
Command Arguments#
- class cloudai.workloads.dynamo_mocker.dynamo_mocker.DynamoMockerCmdArgs(
- *,
- model_path: str = 'Qwen/Qwen3-0.6B',
- nats_cmd: str = 'nats-server -js',
- engine: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerEngineArgs = <factory>,
- worker: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerWorkerConfig = <factory>,
- frontend: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerFrontendArgs = <factory>,
- benchmark_tool: ~typing.Literal['genai_perf',
- 'aiperf'] = 'genai_perf',
- genai_perf: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerGenAIPerfArgs = <factory>,
- aiperf: ~cloudai.workloads.dynamo_mocker.dynamo_mocker.MockerAIPerfArgs = <factory>,
- **extra_data: ~typing.Any,
Bases:
CmdArgsTop-level command arguments for the Dynamo Mocker workload.
Test Definition#
- class cloudai.workloads.dynamo_mocker.dynamo_mocker.DynamoMockerTestDefinition(
- *,
- name: str,
- description: str,
- test_template_name: str,
- cmd_args: DynamoMockerCmdArgs,
- extra_env_vars: dict[str, str | List[str]] = {},
- extra_cmd_args: dict[str, str] = {},
- extra_container_mounts: list[str] = [],
- git_repos: list[GitRepo] = [],
- nsys: NsysConfiguration | None = None,
- predictor: PredictorConfig | None = None,
- agent: str = 'grid_search',
- agent_steps: int = 1,
- agent_metrics: list[str] = ['default'],
- agent_reward_function: str = 'inverse',
- agent_config: dict[str, Any] | None = None,
- success_marker: str = 'success-marker.txt',
- failure_marker: str = 'failure-marker.txt',
Bases:
TestDefinitionTest definition for the Dynamo Mocker workload.
Runs dynamo.mocker + dynamo.frontend as a lightweight GPU-free LLM inference simulator, then benchmarks it with genai-perf or aiperf via dynamo_mocker.sh. Select the benchmark tool with cmd_args.benchmark_tool (“genai_perf” or “aiperf”).
- Supports two topologies mirroring ai_dynamo.sh:
Combined (worker.disaggregation_mode=none): single mocker handles prefill+decode
Disaggregated (worker.disaggregation_mode=prefill_decode): separate prefill and decode mocker instances with KV event publishing and simulated transfer
Requires: ai-dynamo and genai-perf or aiperf pip-installed in the active environment.