Supported Models#
The following is a table of supported models for the PyTorch backend:
Architecture |
Model |
HuggingFace Example |
---|---|---|
|
BERT-based |
|
|
Nemotron |
|
|
DeepSeek-V3 |
|
|
EXAONE 4.0 |
|
|
Gemma 3 |
|
|
Llama 3.1, Llama 3, Llama 2, LLaMA |
|
|
Llama 4 |
|
|
Mistral |
|
|
Mixtral |
|
|
Llama 3.2 |
|
|
Nemotron-3, Nemotron-4, Minitron |
|
|
NemotronNAS |
|
|
QwQ, Qwen2 |
|
|
Phi-4 |
|
|
Qwen2-based |
|
|
Qwen2-based |
|
|
Qwen3 |
|
|
Qwen3MoE |
|
Model-Feature Support Matrix(Key Models)#
Note: Support for other models may vary. Features marked “N/A” are not applicable to the model architecture.
Model Architecture/Feature |
Overlap Scheduler |
CUDA Graph |
Attention Data Parallelism |
Disaggregated Serving |
Chunked Prefill |
MTP |
EAGLE-3(One Model Engine) |
EAGLE-3(Two Model Engine) |
Torch Sampler |
TLLM C++ Sampler |
KV Cache Reuse |
Sliding Window Attention |
Logits Post Processor |
Guided Decoding |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DeepseekV3ForCausalLM |
Yes |
Yes |
Yes |
Yes |
Yes [1] |
Yes |
No |
No |
Yes |
Yes |
Yes [2] |
N/A |
Yes |
Yes |
Qwen3MoeForCausalLM |
Yes |
Yes |
Yes |
Yes |
Yes |
No |
Yes |
Yes |
Yes |
Yes |
Yes |
N/A |
Yes |
Yes |
Llama4ForConditionalGeneration |
Yes |
Yes |
Yes |
Yes |
Yes |
No |
Yes |
Yes |
Yes |
Yes |
Untested |
N/A |
Yes |
Yes |
GPT-OSS |
Yes |
Yes |
Yes |
Yes |
No |
No |
Yes |
No |
Yes |
Yes |
No |
N/A |
Yes |
Yes |
Multimodal Feature Support Matrix (PyTorch Backend)#
Model Architecture/Feature |
Overlap Scheduler |
CUDA Graph |
Chunked Prefill |
Torch Sampler |
TLLM C++ Sampler |
KV Cache Reuse |
Logits Post Processor |
EPD Disaggregated Serving |
Modality |
---|---|---|---|---|---|---|---|---|---|
Gemma3ForConditionalGeneration |
Yes |
Yes |
N/A |
Yes |
Yes |
N/A |
Yes |
No |
L + I |
HCXVisionForCausalLM |
Yes |
Yes |
No |
Yes |
Yes |
No |
Yes |
No |
L + I |
LlavaLlamaModel (VILA) |
Yes |
Yes |
No |
Yes |
Yes |
No |
Yes |
No |
L + I + V |
LlavaNextForConditionalGeneration |
Yes |
Yes |
No |
Yes |
Yes |
No |
Yes |
No |
L + I |
Llama4ForConditionalGeneration |
Yes |
Yes |
No |
Yes |
Yes |
No |
Yes |
No |
L + I |
Mistral3ForConditionalGeneration |
Yes |
Yes |
No |
Yes |
Yes |
No |
Yes |
No |
L + I |
Phi4MMForCausalLM |
Yes |
Yes |
No |
Yes |
Yes |
No |
Yes |
No |
L + I + A |
Qwen2VLForConditionalGeneration |
Yes |
Yes |
No |
Yes |
Yes |
Yes |
Yes |
No |
L + I + V |
Qwen2_5_VLForConditionalGeneration |
Yes |
Yes |
No |
Yes |
Yes |
Yes |
Yes |
No |
L + I + V |
Note:
L: Language
I: Image
V: Video
A: Audio