DDLB#
This workload (test_template_name is DDLB) allows you to execute DDLB (Distributed Deep Learning Benchmarks) within the CloudAI framework. Please find the DDLB README at samnordmann/ddlb.
Usage Example#
Test TOML example:
name = "my_ddlb_test"
description = "Example DDLB test"
test_template_name = "DDLB"
[cmd_args]
docker_image_url = "gitlab-master.nvidia.com/nsarkauskas/ddlb:latest"
primitive = "tp_columnwise"
dtype = "float16"
Test Scenario example:
name = "ddlb-test"
[[Tests]]
id = "ddlb.1"
num_nodes = 1
time_limit = "00:10:00"
test_name = "my_ddlb_test"
Test-in-Scenario example:
name = "ddlb-test"
[[Tests]]
id = "ddlb.1"
num_nodes = 1
time_limit = "00:10:00"
name = "my_ddlb_test"
description = "Example DDLB test"
test_template_name = "DDLB"
[Tests.cmd_args]
docker_image_url = "gitlab-master.nvidia.com/nsarkauskas/ddlb:latest"
primitive = "tp_columnwise"
m = 1024
n = 128
k = 1024
dtype = "float16"
num_iterations = 50
num_warmups = 5
impl = "pytorch;backend=nccl;order=AG_before"
API Documentation#
Command Arguments#
- class cloudai.workloads.ddlb.ddlb.DDLBCmdArgs(
- *,
- docker_image_url: str,
- primitive: str,
- m: int | list[int] = 1024,
- n: int | list[int] = 128,
- k: int | list[int] = 1024,
- dtype: str,
- num_iterations: int = 50,
- num_warmups: int = 5,
- impl: str | list[str] = 'pytorch;backend=nccl;order=AG_before',
- **extra_data: Any,
Bases:
CmdArgsDDLB test command arguments.
- docker_image_url: str#
- primitive: str#
- m: int | list[int]#
- n: int | list[int]#
- k: int | list[int]#
- dtype: str#
- num_iterations: int#
- num_warmups: int#
- impl: str | list[str]#
Test Definition#
- class cloudai.workloads.ddlb.ddlb.DDLBTestDefinition(
- *,
- name: str,
- description: str,
- test_template_name: str,
- cmd_args: DDLBCmdArgs,
- extra_env_vars: dict[str, str | List[str]] = {},
- extra_cmd_args: dict[str, str] = {},
- extra_container_mounts: list[str] = [],
- git_repos: list[GitRepo] = [],
- nsys: NsysConfiguration | None = None,
- predictor: PredictorConfig | None = None,
- agent: str = 'grid_search',
- agent_steps: int = 1,
- agent_metrics: list[str] = ['default'],
- agent_reward_function: str = 'inverse',
Bases:
TestDefinitionTest object for DDLB.
- cmd_args: DDLBCmdArgs#
- property extra_args_str: str#
- property docker_image: DockerImage#
- property installables: list[Installable]#