Supported Models#
Code Location:
experimental/llm_loader/(recommended export),experimental/quantization/(checkpoint quantization),experimental/server/(Python API/server),tensorrt_edgellm/(legacy export),cpp/(runtime)Pre-Quantized Checkpoints: When a supported pre-quantized checkpoint is available, the checkpoint-based loader can export it directly without a separate quantization step.
Support Policy#
TensorRT Edge-LLM supports the checkpoint IDs listed below. Dense LLM families include official dense checkpoints below 30B parameters. Larger dense checkpoints and non-dense variants require case-by-case validation. MoE, multimodal, audio, TTS, omni, and EAGLE support is limited to the listed rows.
The model coverage list is not comprehensive, and not every listed checkpoint has been fully verified on every supported platform and precision. If a listed model does not export, build, or run correctly, please report an issue with the checkpoint ID, precision, platform, and command line used.
The model class names were checked against the installed transformers==5.3.0 package and the upstream Transformers model source tree. Checkpoint IDs are linked to their Hugging Face pages and grouped into original checkpoints and quantized checkpoints.
Precision Notes#
Dense precision set: FP16/BF16 checkpoints, ModelOpt FP8/MXFP8/FP4/NVFP4/INT4 AWQ/INT8 SmoothQuant checkpoints, and INT4 GPTQ checkpoints. INT8 GPTQ is not supported.
For FP16/BF16 source checkpoints, use the Quantization script to create a unified quantized checkpoint for
llm_loader, then export the generated checkpoint.FP8 KV cache is detected automatically from checkpoint metadata by
llm_loader.llm_loaderexports visual encoders in FP16. FP8 visual encoder export is available through the legacytensorrt_edgellmvisual quantization/export tools.MXFP8 and FP4/NVFP4 require Blackwell-class hardware for runtime execution.
Support Matrix#
Category |
Model series |
Transformers class / checkpoint architecture |
|
Checkpoint type |
Checkpoint ID |
Supported precisions |
|---|---|---|---|---|---|---|
Dense LLM |
Llama 3.x Instruct / selected Llama-derived Instruct |
|
Original |
Dense precision set |
||
Original |
||||||
Original |
||||||
Original |
||||||
Quantized |
||||||
Quantized |
||||||
Dense LLM |
Qwen2/Qwen2.5 dense and Qwen-derived dense |
|
Original |
Dense precision set |
||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Dense LLM |
Qwen3 dense |
|
Original |
Dense precision set |
||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Dense LLM / VLM |
Qwen3.5 text and VLM |
Text: |
Original |
Dense precision set for text; VLM original checkpoints only |
||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Dense LLM |
Nemotron Nano dense |
|
Original |
BF16, FP8, NVFP4 |
||
Original |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
MoE |
Qwen3-MoE |
|
Quantized |
INT4 only |
||
MoE |
Nemotron3-MoE |
|
Quantized |
NVFP4 only |
||
VLM |
Qwen2.5-VL |
|
Original |
Dense precision set for LLM backbone |
||
Original |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
VLM |
Qwen3-VL / compatible |
|
Original |
Dense precision set for LLM backbone |
||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
VLM |
InternVL3 / InternVL3.5 HF format |
|
Original |
Dense precision set for LLM backbone |
||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Original |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
Quantized |
||||||
VLM |
Phi-4-Multimodal |
|
Original |
Merge vision LoRA, then dense precision set for the LLM backbone |
||
Audio / Speech |
Qwen3-ASR |
Checkpoint architecture |
|
Original |
FP16 |
|
Original |
||||||
TTS |
Qwen3-TTS |
Checkpoint architecture |
|
Original |
FP16 |
|
Omni |
Nemotron-Omni |
Checkpoint architecture |
|
Quantized |
NVFP4 only |
Qwen3-ASR and Qwen3-TTS use checkpoint architecture names that are not present in the installed transformers==5.3.0 package, so TensorRT Edge-LLM handles their speech/audio/talker components with local model implementations.
EAGLE3 Draft Models#
EAGLE3 draft checkpoints are detected by draft_vocab_size in config.json and exported with Eagle3DraftModel. Draft checkpoints can be quantized with experimental.quantization using the same ModelOpt methods exposed by the draft quantization CLI: fp8, int4_awq, nvfp4, mxfp8, and int8_sq for the backbone; fp8, int4_awq, nvfp4, and mxfp8 for the LM head; and fp8 for KV cache.
Draft checkpoint |
Base model |
Draft config class |
|---|---|---|
|
||
|
||
|
||
|
||
|
||
|