Feature Combination Matrix#
Feature |
Overlap Scheduler |
CUDA Graph |
Attention Data Parallelism |
Disaggregated Serving |
Chunked Prefill |
MTP |
EAGLE-3(One Model Engine) |
EAGLE-3(Two Model Engine) |
Torch Sampler |
TLLM C++ Sampler |
KV Cache Reuse |
Slide Window Attention |
Logits Post Processor |
Guided Decoding |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Overlap Scheduler |
— |
|||||||||||||
CUDA Graph |
Yes |
— |
||||||||||||
Attention Data Parallelism |
Yes |
Yes |
— |
|||||||||||
Disaggregated Serving |
Yes |
Yes |
Yes |
— |
||||||||||
Chunked Prefill |
Yes |
Yes |
Yes |
Untested |
— |
|||||||||
MTP |
Yes |
Yes |
Yes |
Yes |
Untested |
— |
||||||||
EAGLE-3(One Model Engine) |
Yes |
Yes |
Yes |
No |
Untested |
No |
— |
|||||||
EAGLE-3(Two Model Engine) |
NO |
Yes |
Yes |
No |
Untested |
No |
No |
— |
||||||
Torch Sampler |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
— |
|||||
TLLM C++ Sampler |
Yes |
Yes |
Yes |
Yes |
Yes |
No |
No |
No |
No |
— |
||||
KV Cache Reuse |
Yes |
Yes |
Yes |
Untested |
Untested |
Untested |
Yes |
No |
Yes |
Yes |
— |
|||
Slide Window Attention |
Yes |
Yes |
Yes |
Untested |
Untested |
Untested |
Untested |
Untested |
Yes |
Yes |
WIP |
— |
||
Logits Post Processor |
No |
Yes |
Yes |
No |
Untested |
No |
No |
No |
Yes |
Yes |
Yes |
Yes |
— |
|
Guided Decoding |
No |
Yes |
Yes |
Untested |
Yes |
No |
No |
No |
Yes |
Yes |
Yes |
Yes |
Yes |
— |