Model Recipes#
Quick Start for Popular Models#
The table below contains trtllm-serve commands that can be used to easily deploy popular models including DeepSeek-R1, gpt-oss, Llama 4, Qwen3, and more.
We maintain LLM API configuration files for these models containing recommended performance settings in the examples/configs directory. The TensorRT LLM Docker container makes the config files available at /app/tensorrt_llm/examples/configs, but you can customize this as needed:
export TRTLLM_DIR="/app/tensorrt_llm" # path to the TensorRT LLM repo in your local environment
Note
The configs here are specifically optimized for a target ISL/OSL (Input/Output Sequence Length) of 1024/1024. If your traffic pattern is different, you may benefit from additional tuning. In the future, we plan to provide more configs for a wider range of traffic patterns.
This table is designed to provide a straightforward starting point; for detailed model-specific deployment guides, check out the guides below.
Model Name |
GPU |
Inference Scenario |
Config |
Command |
|---|---|---|---|---|
H100, H200 |
Max Throughput |
|
||
B200, GB200 |
Max Throughput |
|
||
B200, GB200 |
Max Throughput |
|
||
B200, GB200 |
Min Latency |
|
||
Any |
Max Throughput |
|
||
Any |
Min Latency |
|
||
Any |
Max Throughput |
|
||
Qwen3 family (e.g. Qwen3-30B-A3B) |
Any |
Max Throughput |
|
|
Any |
Max Throughput |
|
||
Any |
Max Throughput |
|
Model-Specific Deployment Guides#
The deployment guides below provide more detailed instructions for serving specific models with TensorRT LLM.
- Deployment Guide for DeepSeek R1 on TensorRT LLM - Blackwell & Hopper Hardware
- Deployment Guide for Llama3.3 70B on TensorRT LLM - Blackwell & Hopper Hardware
- Deployment Guide for Llama4 Scout 17B on TensorRT LLM - Blackwell & Hopper Hardware
- Deployment Guide for GPT-OSS on TensorRT-LLM - Blackwell Hardware
- Deployment Guide for Qwen3 Next on TensorRT LLM - Blackwell & Hopper Hardware