Examples#

End-to-end workflows demonstrating TensorRT Edge-LLM capabilities across different use cases.

Available Examples#

VLM (Vision-Language Model) - Complete workflow for vision-language models with image understanding capabilities
Speculative Decoding - EAGLE3, MTP, and DFlash speculative decoding for faster inference
Phi-4 Multimodal - Phi-4-Multimodal deployment with LoRA merge
ASR (Automatic Speech Recognition) - Speech-to-text with Qwen3-ASR models, including optional FP8 / NVFP4 quantization recipes
MoE (Mixture of Experts) - Mixture of Experts model deployment
TTS (Text-to-Speech) - Text-to-speech synthesis workflows
Alpamayo-R1-10B (VLA) - Vision-language-action workflow with image, text, trajectory history, and action prediction
Omni (Audio + Vision + Speech I/O) - End-to-end Qwen3-Omni multimodal pipeline with NVFP4 quantization
Experimental High-Level Python API and Server - vLLM-style API and OpenAI-compatible server with spec-decode support