PyTorch Backend#

Note

Note: This feature is currently experimental, and the related API is subjected to change in future versions.

To enhance the usability of the system and improve developer efficiency, TensorRT-LLM launches a new experimental backend based on PyTorch.

The PyTorch backend of TensorRT-LLM is available in version 0.17 and later. You can try it via importing tensorrt_llm._torch.

Quick Start#

Here is a simple example to show how to use tensorrt_llm.LLM API with Llama model.

 1from tensorrt_llm import LLM, SamplingParams
 2
 3
 4def main():
 5    prompts = [
 6        "Hello, my name is",
 7        "The president of the United States is",
 8        "The capital of France is",
 9        "The future of AI is",
10    ]
11    sampling_params = SamplingParams(max_tokens=32)
12
13    llm = LLM(model='TinyLlama/TinyLlama-1.1B-Chat-v1.0')
14    outputs = llm.generate(prompts, sampling_params)
15
16    for i, output in enumerate(outputs):
17        prompt = output.prompt
18        generated_text = output.outputs[0].text
19        print(f"[{i}] Prompt: {prompt!r}, Generated text: {generated_text!r}")
20
21
22if __name__ == '__main__':
23    main()

Features#

Developer Guide#

Key Components#

Known Issues#

  • The PyTorch backend on SBSA is incompatible with bare metal environments like Ubuntu 24.04. Please use the PyTorch NGC Container for optimal support on SBSA platforms.