LLM API with TensorRT Engine#
A simple inference example with TinyLlama using the LLM API:
For more advanced usage including distributed inference, multimodal, and speculative decoding, please refer to this README.
A simple inference example with TinyLlama using the LLM API:
For more advanced usage including distributed inference, multimodal, and speculative decoding, please refer to this README.