Query Decomposition
Example Features
This example deploys a recursive query decomposition example for chat Q&A. The example uses the llama2-70b chat model from an NVIDIA API Catalog endpoint for inference.
Query decomposition can perform RAG when the agent needs to access information from several different documents (also referred to as chunks) or to perform some computation on the answers. This example uses a custom LangChain agent that recursively breaks down the questions into subquestions. The agent then attempts to answer the subquestions.
The agent has access to two tools:
search: to perform standard RAG on a subquestion.
math: to pose a math question to the LLM.
The agent continues to break down the question into subquestions until it has the answers that it needs to form the final answer.
Model |
Embedding |
Framework |
Description |
Multi-GPU |
TRT-LLM |
Model Location |
Triton |
Vector Database |
---|---|---|---|---|---|---|---|---|
ai-llama2-70b |
nvolveqa_40k |
LangChain |
QA chatbot |
NO |
NO |
API Catalog |
NO |
Milvus |
The following figure shows the sample topology:
The sample chat bot web application communicates with the chain server. The chain server sends inference requests to an NVIDIA API Catalog endpoint.
Optionally, you can deploy NVIDIA Riva. Riva can use automatic speech recognition to transcribe your questions and use text-to-speech to speak the answers aloud.
Prerequisites
Clone the Generative AI examples Git repository using Git LFS:
$ sudo apt -y install git-lfs $ git clone git@github.com:NVIDIA/GenerativeAIExamples.git $ cd GenerativeAIExamples/ $ git lfs pull
Install Docker Engine and Docker Compose. Refer to the instructions for Ubuntu.
Optional: Enable NVIDIA Riva automatic speech recognition (ASR) and text to speech (TTS).
To launch a Riva server locally, refer to the Riva Quick Start Guide.
In the provided
config.sh
script, setservice_enabled_asr=true
andservice_enabled_tts=true
, and select the desired ASR and TTS languages by adding the appropriate language codes toasr_language_code
andtts_language_code
.After the server is running, assign its IP address (or hostname) and port (50051 by default) to
RIVA_API_URI
indeploy/compose/compose.env
.
Alternatively, you can use a hosted Riva API endpoint. You might need to obtain an API key and/or Function ID for access.
In
deploy/compose/compose.env
, make the following assignments as necessary:export RIVA_API_URI="<riva-api-address/hostname>:<port>" export RIVA_API_KEY="<riva-api-key>" export RIVA_FUNCTION_ID="<riva-function-id>"
Get an API Key for the Llama 2 70B API Endpoint
Perform the following steps if you do not already have an API key. You can use different model API endpoints with the same API key.
Navigate to https://build.ngc.nvidia.com/explore/reasoning.
Find the Llama 2 70B card and click the card.
Click Get API Key.
Click Generate Key.
Click Copy Key and then save the API key. The key begins with the letters nvapi-.
Build and Start the Containers
In the Generative AI examples repository, export this variable in terminal.
Add the API key for the model endpoint:
export NVIDIA_API_KEY="nvapi=..."
From the root of the repository, build the containers:
$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/rag-app-query-decomposition-agent.yaml build
Start the containers:
$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/rag-app-query-decomposition-agent.yaml up -d
Example Output
✔ Network nvidia-rag Created ✔ Container chain-server Started ✔ Container rag-playground Started
Start the Milvus vector database:
$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/docker-compose-vectordb.yaml up -d milvus
Example Output
✔ Container milvus-minio Started ✔ Container milvus-etcd Started ✔ Container milvus-standalone Started
Confirm the containers are running:
$ docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
Example Output
CONTAINER ID NAMES STATUS 0be0d21b2fee rag-playground Up 33 minutes 524905ec3870 chain-server Up 33 minutes 14cb139a2e4a milvus-standalone Up 34 minutes 7a807d96c113 milvus-minio Up 34 minutes (healthy) 937e4165e875 milvus-etcd Up 34 minutes (healthy)
Next Steps
Access the web interface for the chat server. Refer to Using the Sample Chat Web Application for information about using the web interface.
Ensure that you upload documents and use the knowledge base to answer queries.
Stop the containers by running
docker compose -f deploy/compose/rag-app-query-decomposition-agent.yaml down
anddocker compose -f deploy/compose/docker-compose-vectordb.yaml down
.