NVIDIA Generative AI Examples

Generative AI enables users to quickly generate new content based on a variety of inputs and is a powerful tool for streamlining the workflow of creatives, engineers, researchers, scientists, and more. The use cases and possibilities span all industries and individuals. Generative AI models can produce novel content like stories, emails, music, images, and videos.

Generative AI starts with foundational models trained on vast quantities of unlabeled data. Large language models (LLMs) are trained on an extensive range of textual data online. These LLMs can understand prompts and generate novel, human-like responses. Businesses can build applications to leverage this capability of LLMs. Some uses are creative writing assistants for marketing, document summarization for legal teams, and code writing for software development.

The NVIDIA Generative AI Examples use Docker Compose run Retrieval Augmented Generation (RAG) Large Language Model (LLM) pipelines.

All the example pipelines deploy a sample chat bot application for question and answering that is enhanced with RAG. The chat bot also supports uploading documents to create a knowledge base.

Developer RAG Examples

Model	Embedding	Framework	Description	Multi-GPU	TensorRT-LLM	Model Location	Triton Inference Server	Vector Database
ai-mixtral-8x7b-instruct	nvolveqa_40k	LangChain	Using the NVIDIA API Catalog	NO	NO	API Catalog	NO	Milvus or pgvector
llama-2	e5-large-v2	LlamaIndex	Using Local GPUs for a Q&A Chatbot	NO	YES	Local Model	YES	Milvus or pgvector
llama-2	e5-large-v2	LlamaIndex	Multi-GPU for Inference	YES	YES	Local Model	YES	Milvus or pgvector
ai-llama2-70b	nvolveqa_40k	LangChain	Query Decomposition	NO	NO	API Catalog	NO	Milvus or pgvector
llama2-7b	e5-large-v2	LlamaIndex	Quantized LLM Inference Model	NO	YES	Local Model	YES	Milvus or pgvector
ai-mixtral-8x7b-instruct for response generation ai-mixtral-8x7b-instruct for PandasAI	Not Applicable	PandasAI	Structured Data	NO	NO	API Catalog	NO	Not Applicable
ai-mixtral-8x7b-instruct for response generation ai-google-Deplot for graph to text conversion ai-Neva-22B for image to text conversion	nvolveqa_40k	Custom Python	Multimodal Data	NO	NO	API Catalog	NO	Milvus or pgvector
ai-llama2-70b	nvolveqa_40k	LangChain	Multi-Turn Conversational Chat Bot	NO	NO	API Catalog	NO	Milvus or pgvector

Open Source Connectors

These are open source connectors for NVIDIA-hosted and self-hosted API endpoints. These open source connectors are maintained and tested by NVIDIA engineers.

Name	Framework	Chat	Text Embedding	Python	Description
NVIDIA AI Foundation Endpoints	Langchain	YES	YES	YES	Easy access to NVIDIA hosted models. Supports chat, embedding, code generation, steerLM, multimodal, and RAG.
NVIDIA Triton + TensorRT-LLM	Langchain	YES	YES	YES	This connector allows Langchain to remotely interact with a Triton inference server over GRPC or HTTP tfor optimized LLM inference.
NVIDIA Triton Inference Server	LlamaIndex	YES	YES	NO	Triton inference server provides API access to hosted LLM models over gRPC.
NVIDIA TensorRT-LLM	LlamaIndex	YES	YES	NO	TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs.