Welcome to TensorRT-LLM’s Documentation!
- Multi-Head, Multi-Query, and Group-Query Attention
- C++ GPT Runtime
- Executor API
- Graph Rewriting Module
- The Batch Manager in TensorRT-LLM
- Inference Request
- Responses
- Run gpt-2b + LoRA using GptManager / cpp runtime
- Expert Parallelism in TensorRT-LLM
- KV cache reuse
- Speculative Sampling
- Lookahead decoding