Feature Combination Matrix#

Feature

Overlap Scheduler

CUDA Graph

Attention Data Parallelism

Disaggregated Serving

Chunked Prefill

MTP

EAGLE-3(One Model Engine)

EAGLE-3(Two Model Engine)

Torch Sampler

TLLM C++ Sampler

KV Cache Reuse

Slide Window Attention

Logits Post Processor

Guided Decoding

Overlap Scheduler

CUDA Graph

Yes

Attention Data Parallelism

Yes

Yes

Disaggregated Serving

Yes

Yes

Yes

Chunked Prefill

Yes

Yes

Yes

Untested

MTP

Yes

Yes

Yes

Yes

Untested

EAGLE-3(One Model Engine)

Yes

Yes

Yes

No

Untested

No

EAGLE-3(Two Model Engine)

NO

Yes

Yes

No

Untested

No

No

Torch Sampler

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

TLLM C++ Sampler

Yes

Yes

Yes

Yes

Yes

No

No

No

No

KV Cache Reuse

Yes

Yes

Yes

Untested

Untested

Untested

Yes

No

Yes

Yes

Slide Window Attention

Yes

Yes

Yes

Untested

Untested

Untested

Untested

Untested

Yes

Yes

WIP

Logits Post Processor

No

Yes

Yes

No

Untested

No

No

No

Yes

Yes

Yes

Yes

Guided Decoding

No

Yes

Yes

Untested

Yes

No

No

No

Yes

Yes

Yes

Yes

Yes