NVIDIA Generative AI Examples
Generative AI enables users to quickly generate new content based on a variety of inputs and is a powerful tool for streamlining the workflow of creatives, engineers, researchers, scientists, and more. The use cases and possibilities span all industries and individuals. Generative AI models can produce novel content like stories, emails, music, images, and videos.
Generative AI starts with foundational models trained on vast quantities of unlabeled data. Large language models (LLMs) are trained on an extensive range of textual data online. These LLMs can understand prompts and generate novel, human-like responses. Businesses can build applications to leverage this capability of LLMs. Some uses are creative writing assistants for marketing, document summarization for legal teams, and code writing for software development.
The NVIDIA Generative AI Examples use Docker Compose run Retrieval Augmented Generation (RAG) Large Language Model (LLM) pipelines.
All the example pipelines deploy a sample chat bot application for question and answering that is enhanced with RAG. The chat bot also supports uploading documents to create a knowledge base.
Developer RAG Examples
Model
|
Embedding
|
Framework
|
Description
|
Multi-GPU
|
TensorRT-LLM
|
Model
Location
|
Triton
Inference
Server
|
Vector
Database
|
---|---|---|---|---|---|---|---|---|
ai-mixtral-8x7b-instruct |
nvolveqa_40k |
LangChain |
NO |
NO |
API Catalog |
NO |
Milvus or pgvector |
|
llama-2 |
e5-large-v2 |
LlamaIndex |
NO |
YES |
Local Model |
YES |
Milvus or pgvector |
|
llama-2 |
e5-large-v2 |
LlamaIndex |
YES |
YES |
Local Model |
YES |
Milvus or pgvector |
|
ai-llama2-70b |
nvolveqa_40k |
LangChain |
NO |
NO |
API Catalog |
NO |
Milvus or pgvector |
|
llama2-7b |
e5-large-v2 |
LlamaIndex |
NO |
YES |
Local Model |
YES |
Milvus or pgvector |
|
ai-mixtral-8x7b-instruct for response generation ai-mixtral-8x7b-instruct for PandasAI |
Not Applicable |
PandasAI |
NO |
NO |
API Catalog |
NO |
Not Applicable |
|
ai-mixtral-8x7b-instruct for response generation ai-google-Deplot for graph to text conversion ai-Neva-22B for image to text conversion |
nvolveqa_40k |
Custom Python |
NO |
NO |
API Catalog |
NO |
Milvus or pgvector |
|
ai-llama2-70b |
nvolveqa_40k |
LangChain |
NO |
NO |
API Catalog |
NO |
Milvus or pgvector |
Open Source Connectors
These are open source connectors for NVIDIA-hosted and self-hosted API endpoints. These open source connectors are maintained and tested by NVIDIA engineers.
Name |
Framework |
Chat |
Text Embedding |
Python |
Description |
---|---|---|---|---|---|
Easy access to NVIDIA hosted models. Supports chat, embedding, code generation, steerLM, multimodal, and RAG. |
|||||
This connector allows Langchain to remotely interact with a Triton inference server over GRPC or HTTP tfor optimized LLM inference. |
|||||
YES |
YES |
NO |
Triton inference server provides API access to hosted LLM models over gRPC. |
||
YES |
YES |
NO |
TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs. |