TTS (Text-to-Speech)#
This guide covers the full pipeline for running Qwen3-TTS: export on x86 host, engine build on device, and inference.
Supported models: Qwen3-TTS-12Hz-0.6B-CustomVoice and Qwen3-TTS-12Hz-1.7B-CustomVoice.
Note: Unlike Qwen3-Omni, Qwen3-TTS has no Thinker or visual encoder. The text embedding is self-contained in the Talker and exported as
text_embedding.safetensors.
Prerequisites: Complete the Installation Guide before proceeding.
Part 0: Install Export Dependencies (x86 Host)#
Qwen3-TTS export is handled by llm_loader directly. No external Qwen3-TTS
Python package is required for export.
cd TensorRT-Edge-LLM
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
pip3 install -r experimental/llm_loader/requirements.txt
Part 1: Export on x86 Host#
Qwen3-TTS has three components: Talker, CodePredictor, and Code2Wav. Export all of them with llm_loader.export_all_cli.
cd TensorRT-Edge-LLM
export PYTHONPATH=$PWD/experimental:$PYTHONPATH
export WORKSPACE_DIR=$HOME/tensorrt-edgellm-workspace
export TTS_MODEL=Qwen3-TTS-12Hz-1.7B-CustomVoice
export ONNX_OUTPUT_DIR=$WORKSPACE_DIR/$TTS_MODEL/onnx
python3 -m llm_loader.export_all_cli \
Qwen/$TTS_MODEL \
$ONNX_OUTPUT_DIR
Expected Export Output#
$ONNX_OUTPUT_DIR/
├── llm/
│ ├── model.onnx + model.onnx.data # Talker ONNX
│ ├── config.json # model_type: qwen3_tts_talker
│ ├── embedding.safetensors # codec embedding
│ ├── text_embedding.safetensors # TTS-only (no Thinker)
│ ├── text_projection.safetensors
│ ├── tokenizer_config.json
│ ├── processed_chat_template.json
│ └── tokenizer files
├── code_predictor/
│ ├── model.onnx + model.onnx.data # CodePredictor ONNX
│ ├── config.json
│ ├── codec_embeddings.safetensors
│ ├── lm_heads.safetensors
│ └── small_to_mtp_projection.safetensors # if not Identity
└── code2wav/
├── model.onnx + model.onnx.data # Code2Wav vocoder
└── config.json
Transfer to Device#
scp -r $ONNX_OUTPUT_DIR <user>@<device>:~/tensorrt-edgellm-workspace/$TTS_MODEL/
Part 2: Build Engines#
Three engine builds are required. Run these on the edge device.
cd ~/TensorRT-Edge-LLM
export WORKSPACE_DIR=$HOME/tensorrt-edgellm-workspace
export TTS_MODEL=Qwen3-TTS-12Hz-1.7B-CustomVoice
export ONNX=$WORKSPACE_DIR/$TTS_MODEL/onnx
export ENG=$WORKSPACE_DIR/$TTS_MODEL/engines
# 1. Build Talker LLM engine
./build/examples/llm/llm_build \
--onnxDir $ONNX/llm \
--engineDir $ENG/talker \
--maxInputLen 4096 \
--maxKVCacheCapacity 4096 \
--maxBatchSize 1
# 2. Build CodePredictor LLM engine
./build/examples/llm/llm_build \
--onnxDir $ONNX/code_predictor \
--engineDir $ENG/code_predictor \
--maxInputLen 4096 \
--maxKVCacheCapacity 4096 \
--maxBatchSize 1
# 3. Build Code2Wav engine
./build/examples/multimodal/audio_build \
--onnxDir $ONNX/code2wav \
--engineDir $ENG
audio_build writes the Code2Wav engine to $ENG/code2wav. Use --engineDir $ENG; passing $ENG/code2wav would create an extra nested directory.
Note: Use
--maxBatchSize 1for the current Qwen3-TTS runtime.
Build time: < 5 minutes
Part 3: Run Inference#
Input File Format#
Each request specifies a messages array and an optional per-request speaker. If omitted, the top-level speaker default is used.
{
"talker_temperature": 0.9,
"talker_top_k": 50,
"repetition_penalty": 1.05,
"speaker": "ryan",
"requests": [
{
"messages": [{"role": "assistant", "content": "Hello, how can I help you today?"}]
},
{
"speaker": "serena",
"messages": [{"role": "assistant", "content": "The weather is sunny and warm."}]
}
]
}
Available speakers: ryan, serena, aiden, vivian, dylan, eric, uncle_fu, ono_anna, sohee
Sampling parameters:
Parameter |
Default |
Description |
|---|---|---|
|
0.9 |
Sampling temperature |
|
50 |
Top-K sampling |
|
1.0 |
Top-P sampling |
|
1.05 |
Penalize repeated codec tokens |
|
4096 |
Max codec frames per request |
|
config default |
Top-level speaker fallback |
Run#
cd /path/to/TensorRT-Edge-LLM
./build/examples/omni/qwen3_tts_inference \
--talkerEngineDir $ENG/talker \
--code2wavEngineDir $ENG/code2wav \
--tokenizerDir $ENG/talker \
--inputFile input.json \
--outputFile output.json \
--outputAudioDir ./audio_output
Generated .wav files are named audio_req{N}.wav (one per request). The output JSON records per-request metadata: audio file path, sample count, duration, and RVQ code file path.
Output JSON Example#
{
"responses": [
{
"request_idx": 0,
"messages": [{"role": "assistant", "content": "Hello, how can I help you today?"}],
"audio_file": "./audio_output/audio_req0.wav",
"audio_samples": 120960,
"audio_sample_rate": 24000,
"audio_duration_ms": 5040,
"rvq_file": "./audio_output/rvq_req0.safetensors"
}
]
}
Notes#
--code2wavEngineDiris optional: auto-detected asparent(talkerEngineDir)/code2wavif not set.RVQ code files (
.safetensors) are saved alongside audio when--outputAudioDiris set and can be used to re-synthesize audio without re-running the TTS model.