Chat Template Format#
Chat templates define how conversational messages are formatted for the language model. This guide explains the JSON-based chat template format used by TensorRT Edge-LLM.
Overview#
Our implementation follows HuggingFace’s apply_chat_template API, but uses a lightweight JSON format instead of Jinja templates. The chat template is automatically extracted during model export and saved as processed_chat_template.json.
File Structure#
{
"roles": {
"system": {"prefix": "string", "suffix": "string"},
"user": {"prefix": "string", "suffix": "string"},
"assistant": {"prefix": "string", "suffix": "string"}
},
"content_types": {
"image": {"format": "string"},
"video": {"format": "string"}
},
"generation_prompt": "string",
"generation_prompt_thinking": "string (optional)",
"default_system_prompt": "string"
}
Fields#
roles(required): Prefix/suffix tokens for each rolecontent_types(optional): Format tokens for images/videosgeneration_prompt(optional): Token sequence to start generationgeneration_prompt_thinking(optional): Alternative prompt for thinking modedefault_system_prompt(optional): Default system instruction
Thinking Mode#
Some models (like Qwen3) support “thinking mode” where the model generates its reasoning process. This is controlled by enable_thinking in the input JSON.
enable_thinking: false(default): Usesgeneration_promptenable_thinking: true: Usesgeneration_prompt_thinking
The system automatically detects thinking mode support during model export. If not supported, the parameter has no effect.
Custom Templates#
llm_loader extracts the chat template from the checkpoint during export and
writes processed_chat_template.json next to the exported LLM ONNX graph.
To customize the chat template, first export the model normally, then edit the generated file in place:
# 1. Export (generates processed_chat_template.json with auto-detected template)
python -m llm_loader.export_all_cli \
/path/to/model \
/path/to/output
# 2. Edit the generated template to customize roles, delimiters, etc.
vi /path/to/output/llm/processed_chat_template.json
The generated processed_chat_template.json is a structured JSON with the
following schema (not a raw Jinja template):
{
"model_path": "path/to/model",
"roles": {
"system": {"prefix": "<prefix>", "suffix": "<suffix>"},
"user": {"prefix": "<prefix>", "suffix": "<suffix>"},
"assistant": {"prefix": "<prefix>", "suffix": "<suffix>"}
},
"content_types": {
"image": {"format": "<placeholder>"},
"video": {"format": "<placeholder>"},
"audio": {"format": "<placeholder>"}
},
"generation_prompt": "<prompt>",
"default_system_prompt": "<system_message>"
}
Reference templates are available under experimental/llm_loader/chat_templates/
(e.g., phi4mm.json, qwen3asr.json, nemotron_nano_v2.json).
Pre-built templates are automatically used for models with known tokenizer issues (e.g., Phi-4-Multimodal).
Examples#
Basic Text-Only Model (Qwen2)#
{
"roles": {
"system": {"prefix": "<|im_start|>system\n", "suffix": "<|im_end|>\n"},
"user": {"prefix": "<|im_start|>user\n", "suffix": "<|im_end|>\n"},
"assistant": {"prefix": "<|im_start|>assistant\n", "suffix": "<|im_end|>\n"}
},
"generation_prompt": "<|im_start|>assistant\n",
"default_system_prompt": "You are a helpful assistant"
}
Multimodal Model (Qwen2-VL)#
{
"roles": {
"system": {"prefix": "<|im_start|>system\n", "suffix": "<|im_end|>\n"},
"user": {"prefix": "<|im_start|>user\n", "suffix": "<|im_end|>\n"},
"assistant": {"prefix": "<|im_start|>assistant\n", "suffix": "<|im_end|>\n"}
},
"content_types": {
"image": {"format": "<|vision_start|><|image_pad|><|vision_end|>"}
},
"generation_prompt": "<|im_start|>assistant\n",
"default_system_prompt": "You are a helpful assistant."
}
Model with Thinking Mode (Qwen3)#
{
"roles": {
"system": {"prefix": "<|im_start|>system\n", "suffix": "<|im_end|>\n"},
"user": {"prefix": "<|im_start|>user\n", "suffix": "<|im_end|>\n"},
"assistant": {"prefix": "<|im_start|>assistant\n", "suffix": "<|im_end|>\n"}
},
"generation_prompt": "<|im_start|>assistant\n<think>\n\n</think>\n\n",
"generation_prompt_thinking": "<|im_start|>assistant\n",
"default_system_prompt": "You are a helpful assistant"
}
System Prompt Priority#
Explicit system message in request (highest)
default_system_promptfrom chat templateNo system prompt if neither provided