Testing Strategy#
This document describes the testing strategy for AutoDeploy, covering the multi-tiered approach used to ensure quality and reliability.
Testing Philosophy#
AutoDeploy uses a multi-tiered testing approach that balances fast feedback with comprehensive coverage:
┌─────────────────────────────────────────────────────────┐
│ Dashboard │
│ (Broad model coverage + performance) │
├─────────────────────────────────────────────────────────┤
│ Integration Tests │
│ (Accuracy tests, CI-registered) │
├─────────────────────────────────────────────────────────┤
│ E2E Mini Tests │
│ (Compile + prompt workflows) │
├─────────────────────────────────────────────────────────┤
│ Unit Tests │
│ (Component testing: patches, transforms, etc.) │
└─────────────────────────────────────────────────────────┘
Unit Tests: Fast, isolated tests for individual components (patches, transforms, custom ops)
E2E Mini Tests: End-to-end workflows testing compile + prompt for unique model combinations
Integration Tests: Important accuracy tests registered individually in CI
Dashboard: Broad model coverage and performance testing across all supported models
Unit Tests#
Unit tests verify individual components like patches, transformations, custom operations, and utilities.
Location#
All unit tests are located in tests/unittest/auto_deploy/:
tests/unittest/auto_deploy/
├── _utils_test/ # Shared test utilities
├── singlegpu/ # Single GPU tests
│ ├── compile/ # Compilation tests
│ ├── custom_ops/ # Custom operations tests
│ ├── models/ # Model-specific patch tests
│ ├── shim/ # Executor/engine tests
│ ├── smoke/ # E2E mini tests (see below)
│ ├── transformations/ # Graph transformation tests
│ └── utils/ # Utility function tests
└── multigpu/ # Multi-GPU tests
├── custom_ops/ # Multi-GPU custom ops
├── smoke/ # Multi-GPU E2E mini tests
└── transformations/ # Multi-GPU transformation tests
CI Registration#
Tests are automatically run in CI once registered. New test files and functions are picked up automatically if they are in an existing registered folder.
Tests are registered in tests/integration/test_lists/test-db/l0_*.yml files under the backend: autodeploy section:
backend: autodeploy
tests:
- unittest/auto_deploy/singlegpu/compile
- unittest/auto_deploy/singlegpu/custom_ops
- unittest/auto_deploy/singlegpu/models
- unittest/auto_deploy/singlegpu/shim
- unittest/auto_deploy/singlegpu/smoke
- unittest/auto_deploy/singlegpu/transformations
- unittest/auto_deploy/singlegpu/utils
Adding a New Folder#
If you create a new folder (not just a new file in an existing folder), you must register it in the appropriate YAML files:
Edit
tests/integration/test_lists/test-db/l0_a30.yml(and other GPU-specific files as needed)Add the new folder path under the
backend: autodeploysectionExample:
- unittest/auto_deploy/singlegpu/my_new_folder
Parallel Execution#
Most unit tests run in parallel using pytest-xdist for faster execution. The exception is the smoke/ subfolders, which run sequentially (see E2E Mini Tests below).
E2E Mini Tests (Smoke Tests)#
E2E mini tests verify complete end-to-end workflows including model compilation and prompt execution for unique model combinations.
Location#
Single GPU:
tests/unittest/auto_deploy/singlegpu/smoke/Multi GPU:
tests/unittest/auto_deploy/multigpu/smoke/
Purpose#
These tests ensure that the full AutoDeploy pipeline works correctly for various model architectures and configurations:
test_ad_build_small_single.py- Tests multiple model configurations (Llama, Mixtral, Qwen, Phi-3, DeepSeek, Mistral, Nemotron)test_ad_trtllm_bench.py- Benchmarking functionalitytest_ad_trtllm_serve.py- Serving functionalitytest_ad_speculative_decoding.py- Speculative decodingtest_ad_export_onnx.py- ONNX export functionality
Execution#
Smoke tests are not executed in parallel to avoid resource contention during full model compilation and execution. They run sequentially within the CI pipeline.
Integration Tests#
Integration tests cover important accuracy tests and other scenarios that require explicit CI registration.
Registration#
Unlike unit tests (where new files in existing folders are auto-discovered), each individual integration test case must be explicitly registered in the CI YAML files.
Format: path/to/test_file.py::test_function_name[param_id]
Example from l0_a30.yml:
- accuracy/test_cli_flow.py::TestLlama3_1_8BInstruct::test_medusa_fp8_prequantized
- examples/test_multimodal.py::test_llm_multimodal_general[Qwen2-VL-7B-Instruct-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:4]
Example: Adding an Accuracy Test#
For reference, see PR #10717 which added a Nemotron 3 super accuracy test. The workflow is:
Create the test function in the appropriate test file
Register the specific test case in the relevant
l0_*.ymlfile(s)Ensure the test passes locally before submitting
Location#
Integration tests are typically located in:
examples/- Model-specific integration testsaccuracy/- Accuracy validation tests
Dashboard (Model Coverage Testing)#
The dashboard provides broad model coverage and performance testing for all supported models in AutoDeploy.
Model Registry#
Models are registered in examples/auto_deploy/model_registry/models.yaml. For detailed instructions, see the Model Registry README.
Format (Version 2.0)#
The registry uses a flat list format with composable configurations:
version: '2.0'
description: AutoDeploy Model Registry - Flat format with composable configs
models:
- name: meta-llama/Llama-3.1-8B-Instruct
yaml_extra: [dashboard_default.yaml, world_size_2.yaml]
- name: meta-llama/Llama-3.3-70B-Instruct
yaml_extra: [dashboard_default.yaml, world_size_4.yaml, llama3_3_70b.yaml]
Key Concepts#
Flat list: Models are in a single list (not grouped)
Composable configs: Each model references YAML config files via
yaml_extraDeep merging: Config files are merged in order (later files override earlier ones)
Configuration Files#
Config files are stored in examples/auto_deploy/model_registry/configs/:
File |
Purpose |
|---|---|
|
Baseline settings for all models |
|
GPU count configuration (1, 2, 4, or 8) |
|
Vision + text models |
|
DemoLLM runtime with Triton backend |
Model-specific configs |
Custom settings for specific models |
World Size Guidelines#
World Size |
Model Size Range |
Example Models |
|---|---|---|
1 |
< 2B params |
TinyLlama, Qwen 0.5B, Phi-4-mini |
2 |
2-15B params |
Llama 3.1 8B, Qwen 7B, Mistral 7B |
4 |
20-80B params |
Llama 3.3 70B, QwQ 32B, Gemma 27B |
8 |
80B+ params |
DeepSeek V3, Llama 405B, Nemotron Ultra |
Adding a New Model#
Add the model entry to
models.yaml:
- name: organization/my-new-model-7b
yaml_extra: [dashboard_default.yaml, world_size_2.yaml]
For models with special requirements, create a custom config in
configs/and reference it:
- name: organization/my-custom-model
yaml_extra: [dashboard_default.yaml, world_size_4.yaml, my_model.yaml]
Validate with
prepare_model_coverage_v2.pyfrom the autodeploy-dashboard repository
The model will be automatically picked up by the dashboard testing infrastructure on the next run.