PyTorch Backend#

Note

Note: This feature is currently in beta, and the related API is subjected to change in future versions.

To enhance the usability of the system and improve developer efficiency, TensorRT LLM launches a new backend based on PyTorch.

The PyTorch backend of TensorRT LLM is available in version 0.17 and later. You can try it via importing tensorrt_llm._torch.

Quick Start#

Here is a simple example to show how to use tensorrt_llm.LLM API with Llama model.

 1from tensorrt_llm import LLM, SamplingParams
 2
 3
 4def main():
 5
 6    # Model could accept HF model name, a path to local HF model,
 7    # or TensorRT Model Optimizer's quantized checkpoints like nvidia/Llama-3.1-8B-Instruct-FP8 on HF.
 8    llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")
 9
10    # Sample prompts.
11    prompts = [
12        "Hello, my name is",
13        "The capital of France is",
14        "The future of AI is",
15    ]
16
17    # Create a sampling params.
18    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
19
20    for output in llm.generate(prompts, sampling_params):
21        print(
22            f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}"
23        )
24
25    # Got output like
26    # Prompt: 'Hello, my name is', Generated text: '\n\nJane Smith. I am a student pursuing my degree in Computer Science at [university]. I enjoy learning new things, especially technology and programming'
27    # Prompt: 'The president of the United States is', Generated text: 'likely to nominate a new Supreme Court justice to fill the seat vacated by the death of Antonin Scalia. The Senate should vote to confirm the'
28    # Prompt: 'The capital of France is', Generated text: 'Paris.'
29    # Prompt: 'The future of AI is', Generated text: 'an exciting time for us. We are constantly researching, developing, and improving our platform to create the most advanced and efficient model available. We are'
30
31
32if __name__ == '__main__':
33    main()

Features#

Developer Guide#

Key Components#

Known Issues#

  • The PyTorch backend on SBSA is incompatible with bare metal environments like Ubuntu 24.04. Please use the PyTorch NGC Container for optimal support on SBSA platforms.

Prototype Features#