Feature Combination Matrix#

Feature

Overlap Scheduler

CUDA Graph

Tensor Parallelism

Pipeline Parallelism

Expert Parallelism

Helix Parallelism

Attention Data Parallelism

Disaggregated Serving

Chunked Prefill

Speculative Decoding — Linear

Speculative Decoding — Dynamic Trees

Speculative Decoding — Legacy Path (NGram, user-provided)

Torch Sampler

TLLM C++ Sampler

KV Cache Reuse

Sliding Window Attention

Logits Post Processor

Guided Decoding

LoRA

Overlap Scheduler

CUDA Graph

Yes

Tensor Parallelism

Yes

Yes

Pipeline Parallelism

Yes

Yes

Yes

Expert Parallelism

Yes

Yes

Yes

Yes

Helix Parallelism

Untested

Yes

Yes

Yes

Yes

Attention Data Parallelism

Yes

Yes

Yes

Yes

Yes

Known issues

Disaggregated Serving

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Chunked Prefill

Yes

Yes

Yes

Untested

Yes

Yes

Yes

Yes

Speculative Decoding — Linear

Yes

Yes

Yes

No

Yes

No

Yes

Yes

Yes

Speculative Decoding — Dynamic Trees

Yes

Yes

Yes

No

Yes

No

Yes

Yes

Yes

No

Speculative Decoding — Legacy Path (NGram, user-provided)

Yes

Yes

Yes

No

Yes

No

Yes

Yes

Yes

No

No

Torch Sampler

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

TLLM C++ Sampler

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

No

No

No

KV Cache Reuse

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Sliding Window Attention

Yes

Yes

Yes

Yes

Yes

Untested

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Logits Post Processor

Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

No

No

No

Yes

Yes

Yes

Yes

Guided Decoding

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

LoRA

Yes

Yes

Yes

Yes

Untested

Untested

Untested

Untested

Yes

Untested

Untested

Untested

Yes

Yes

Yes

Yes

Yes

Untested