LLM Examples Introduction#

Here is a simple example to show how to use the LLM with TinyLlama.

from tensorrt_llm import LLM, SamplingParams


def main():

    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

    llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")

    outputs = llm.generate(prompts, sampling_params)

    # Print the outputs.
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")


# The entry point of the program need to be protected for spawning processes.
if __name__ == '__main__':
    main()

The LLM API can be used for both offline or online usage. See more examples of the LLM API here:

LLM API Examples

Generate Text Using Lookahead Decoding
Generate text with guided decoding
Generate Text Using Medusa Decoding
Generate text with multiple LoRA adapters
Generate Text in Streaming
Generate text
Generation with Quantization
Distributed LLM Generation
Generate Text Asynchronously
Control generated text using logits post processor
Generate text with customization
Automatic Parallelism with LLM

For more details on how to fully utilize this API, check out:

Common customizations
LLM API Reference