Supported Models#
The following is a table of supported models for the PyTorch backend:
Architecture  | 
Model  | 
HuggingFace Example  | 
|---|---|---|
  | 
BERT-based  | 
  | 
  | 
Nemotron  | 
  | 
  | 
DeepSeek-V3  | 
  | 
  | 
EXAONE 4.0  | 
  | 
  | 
Gemma 3  | 
  | 
  | 
Llama 3.1, Llama 3, Llama 2, LLaMA  | 
  | 
  | 
Llama 4  | 
  | 
  | 
Mistral  | 
  | 
  | 
Mixtral  | 
  | 
  | 
Llama 3.2  | 
  | 
  | 
Nemotron-3, Nemotron-4, Minitron  | 
  | 
  | 
NemotronNAS  | 
  | 
  | 
QwQ, Qwen2  | 
  | 
  | 
Qwen2-based  | 
  | 
  | 
Qwen2-based  | 
  | 
  | 
Qwen3  | 
  | 
  | 
Qwen3MoE  | 
  | 
Model-Feature Support Matrix(Key Models)#
Note: Support for other models may vary. Features marked “N/A” are not applicable to the model architecture.
Model Architecture/Feature  | 
Overlap Scheduler  | 
CUDA Graph  | 
Attention Data Parallelism  | 
Disaggregated Serving  | 
Chunked Prefill  | 
MTP  | 
EAGLE-3(One Model Engine)  | 
EAGLE-3(Two Model Engine)  | 
Torch Sampler  | 
TLLM C++ Sampler  | 
KV Cache Reuse  | 
Sliding Window Attention  | 
Logits Post Processor  | 
Guided Decoding  | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DeepseekV3ForCausalLM  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
Yes [1]  | 
Yes  | 
No  | 
No  | 
Yes  | 
Yes  | 
Yes [2]  | 
N/A  | 
Yes  | 
Yes  | 
Qwen3MoeForCausalLM  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
N/A  | 
Yes  | 
Yes  | 
Llama4ForConditionalGeneration  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
Untested  | 
N/A  | 
Yes  | 
Yes  | 
GPT-OSS  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
No  | 
No  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
No  | 
N/A  | 
Yes  | 
Yes  | 
Multimodal Feature Support Matrix (PyTorch Backend)#
Model Architecture/Feature  | 
Overlap Scheduler  | 
CUDA Graph  | 
Chunked Prefill  | 
Torch Sampler  | 
TLLM C++ Sampler  | 
KV Cache Reuse  | 
Logits Post Processor  | 
EPD Disaggregated Serving  | 
Modality  | 
|---|---|---|---|---|---|---|---|---|---|
Gemma3ForConditionalGeneration  | 
Yes  | 
Yes  | 
N/A  | 
Yes  | 
Yes  | 
N/A  | 
Yes  | 
No  | 
L + I  | 
HCXVisionForCausalLM  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
No  | 
L + I  | 
LlavaLlamaModel (VILA)  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
No  | 
L + I + V  | 
LlavaNextForConditionalGeneration  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
No  | 
L + I  | 
Llama4ForConditionalGeneration  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
No  | 
L + I  | 
Mistral3ForConditionalGeneration  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
No  | 
L + I  | 
Phi4MMForCausalLM  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
No  | 
L + I + A  | 
Qwen2VLForConditionalGeneration  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
No  | 
L + I + V  | 
Qwen2_5_VLForConditionalGeneration  | 
Yes  | 
Yes  | 
No  | 
Yes  | 
Yes  | 
Yes  | 
Yes  | 
No  | 
L + I + V  | 
Note:
L: Language
I: Image
V: Video
A: Audio