NVIDIA Generative AI Examples

Generative AI enables users to quickly generate new content based on a variety of inputs and is a powerful tool for streamlining the workflow of creatives, engineers, researchers, scientists, and more. The use cases and possibilities span all industries and individuals. Generative AI models can produce novel content like stories, emails, music, images, and videos.

Generative AI starts with foundational models trained on vast quantities of unlabeled data. Large language models (LLMs) are trained on an extensive range of textual data online. These LLMs can understand prompts and generate novel, human-like responses. Businesses can build applications to leverage this capability of LLMs. Some uses are creative writing assistants for marketing, document summarization for legal teams, and code writing for software development.

The NVIDIA Generative AI Examples use Docker Compose run Retrieval Augmented Generation (RAG) Large Language Model (LLM) pipelines.

All the example pipelines deploy a sample chat bot application for question and answering that is enhanced with RAG. The chat bot also supports uploading documents to create a knowledge base.

Developer RAG Examples

Model
Embedding
Framework
Description
Multi-GPU
TensorRT-LLM
Model
Location
Triton
Inference
Server
Vector
Database

ai-mixtral-8x7b-instruct

nvolveqa_40k

LangChain

Using the NVIDIA API Catalog

NO

NO

API Catalog

NO

Milvus or pgvector

llama-2

e5-large-v2

LlamaIndex

Using Local GPUs for a Q&A Chatbot

NO

YES

Local Model

YES

Milvus or pgvector

llama-2

e5-large-v2

LlamaIndex

Multi-GPU for Inference

YES

YES

Local Model

YES

Milvus or pgvector

ai-llama2-70b

nvolveqa_40k

LangChain

Query Decomposition

NO

NO

API Catalog

NO

Milvus or pgvector

llama2-7b

e5-large-v2

LlamaIndex

Quantized LLM Inference Model

NO

YES

Local Model

YES

Milvus or pgvector

ai-mixtral-8x7b-instruct for response generation

ai-mixtral-8x7b-instruct for PandasAI

Not Applicable

PandasAI

Structured Data

NO

NO

API Catalog

NO

Not Applicable

ai-mixtral-8x7b-instruct for response generation

ai-google-Deplot for graph to text conversion

ai-Neva-22B for image to text conversion

nvolveqa_40k

Custom Python

Multimodal Data

NO

NO

API Catalog

NO

Milvus or pgvector

ai-llama2-70b

nvolveqa_40k

LangChain

Multi-Turn Conversational Chat Bot

NO

NO

API Catalog

NO

Milvus or pgvector

Open Source Connectors

These are open source connectors for NVIDIA-hosted and self-hosted API endpoints. These open source connectors are maintained and tested by NVIDIA engineers.

Name

Framework

Chat

Text Embedding

Python

Description

NVIDIA AI Foundation Endpoints

Langchain

YES

YES

YES

Easy access to NVIDIA hosted models. Supports chat, embedding, code generation, steerLM, multimodal, and RAG.

NVIDIA Triton + TensorRT-LLM

Langchain

YES

YES

YES

This connector allows Langchain to remotely interact with a Triton inference server over GRPC or HTTP tfor optimized LLM inference.

NVIDIA Triton Inference Server

LlamaIndex

YES

YES

NO

Triton inference server provides API access to hosted LLM models over gRPC.

NVIDIA TensorRT-LLM

LlamaIndex

YES

YES

NO

TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs.