Advanced setup and Usage#
CA-RAG supports the following databases as backends for storing and retrieving documents:
Milvus
Elasticsearch
Neo4j
ArangoDB
CA-RAG supports the following retrieval methods for Question-Answering and Retrieval:
VectorRAG (VRAG)
Basic GraphRAG (GRAG)
Chain-of-Thought Retrieval and QA (CoT)
Chain-of-Thought Retrieval with Vision Language Model (VLM)
Foundation-RAG using RAG NVIDIA blueprint (FRAG)
Advanced Graph Retrieval with Graph Traversal and VLM (AdvGRAG)
You can choose from one of the following databases and one of the supported Question-Answering configuration:
Supported Configurations#
The following table shows the compatibility matrix between databases and retrieval methods:
Database / Retrieval Method |
VRAG |
FRAG |
GRAG |
CoT |
VLM |
AdvGRAG |
---|---|---|---|---|---|---|
Milvus |
✅ |
✅ |
- |
- |
- |
- |
Elasticsearch |
✅ |
- |
- |
- |
- |
- |
Neo4j |
- |
- |
✅ |
✅ |
✅ |
✅ |
ArangoDB |
- |
- |
✅ |
✅ |
✅ |
✅ |
Database setup#
Use the following section to setup and start the Database for the desired RAG configuration. Choose the DB to start based on the above table.
Vector-RAG: Milvus#
export MILVUS_DB_HOST=${MILVUS_DB_HOST} #milvus host, e.g. localhost
export MILVUS_DB_PORT=${MILVUS_DB_PORT} #milvus port, e.g. 19530
export NVIDIA_API_KEY=${NVIDIA_API_KEY} #NVIDIA API key
This will start the milvus service by default on port 19530.
curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
bash standalone_embed.sh start
Graph-RAG: Neo4j#
export GRAPH_DB_HOST=${GRAPH_DB_HOST} #neo4j
export GRAPH_DB_PORT=${GRAPH_DB_PORT} #neo4j port, e.g. 7687
export GRAPH_DB_USERNAME=${GRAPH_DB_USERNAME} #neo4j username, e.g. neo4j
export GRAPH_DB_PASSWORD=${GRAPH_DB_PASSWORD} #neo4j password, e.g. password
export NVIDIA_API_KEY=${NVIDIA_API_KEY} #NVIDIA API key
docker run -d \
--name neo4j \
-p ${GRAPH_DB_HTTP_PORT:-7474}:7474 \
-p ${GRAPH_DB_BOLT_PORT:-7687}:7687 \
-e NEO4J_AUTH=${GRAPH_DB_USERNAME:-neo4j}/${GRAPH_DB_PASSWORD:-passneo4j} \
-e NEO4J_PLUGINS='["apoc"]' \
neo4j:5.26.4
Graph-RAG: Arango#
Export the environment variables
export ARANGO_DB_HOST=${ARANGO_DB_HOST} #arango
export ARANGO_DB_PORT=${ARANGO_DB_PORT} #arango port, e.g. 8529
export ARANGO_DB_USERNAME=${ARANGO_DB_USERNAME} #arango username, e.g. root
export ARANGO_DB_PASSWORD=${ARANGO_DB_PASSWORD} #arango password, e.g. password
export NVIDIA_API_KEY=${NVIDIA_API_KEY} #NVIDIA API key
Start docker container for the Arango DB
docker run -d \
--name arango-db \
-p ${ARANGO_DB_PORT:-8529}:${ARANGO_DB_PORT:-8529} \
-e ARANGO_DB_USERNAME=${ARANGO_DB_USERNAME} \
-e ARANGO_ROOT_PASSWORD=${ARANGO_DB_PASSWORD} \
arangodb/arangodb:3.12.4 \
arangod --experimental-vector-index --server.endpoint http://0.0.0.0:${ARANGO_DB_PORT:-8529}
Install additional dependencies for Arango.
uv pip install -e ".[arango]"
Update
config.yaml
tools:
# ... existing tools
graph_db_arango:
type: arango
params:
host: !ENV ${ARANGO_DB_HOST}
port: !ENV ${ARANGO_DB_PORT}
username: !ENV ${ARANGO_DB_USERNAME}
password: !ENV ${ARANGO_DB_PASSWORD}
tools:
embedding: nvidia_embedding
functions:
# ... existing functions
summarization:
type: batch_summarization
# ... update the db `tools`
db: graph_db_arango
ingestion_function:
type: graph_ingestion
# ... update the db in `tools`
db: graph_db_arango
retriever_function:
type: graph_retrieval
# ... update the db in `tools`
db: graph_db_arango
summary_retriever:
type: summary_retriever
# ... update the db in `tools`
db: graph_db_arango
Vector-RAG: Elasticsearch#
Export the required environment variables
export ES_HOST=${ES_HOST} #elastic search host, e.g. localhost
export ES_PORT=${ES_PORT} #elastic search port eg.
export NVIDIA_API_KEY=${NVIDIA_API_KEY} #NVIDIA API key
Start the Elasticsearch DB if not already started via Docker Compose
docker run -d \
--name elasticsearch \
-p ${ES_PORT:-9200}:${ES_PORT:-9200} \
-p ${ES_TRANSPORT_PORT:-9300}:${ES_TRANSPORT_PORT:-9300} \
-e discovery.type=single-node \
-e xpack.security.enabled=false \
--memory=${ES_MEM_LIMIT:-6442450944} \
elasticsearch:9.1.2
Update
config.yaml
tools:
# ... existing tools
elasticsearch_db:
type: elasticsearch
params:
host: !ENV ${ES_HOST}
port: !ENV ${ES_PORT}
tools:
embedding: nvidia_embedding
functions:
# ... existing functions
summarization:
type: batch_summarization
# ... update the db `tools`
db: elasticsearch_db
ingestion_function:
type: vector_ingestion
# ... update the db in `tools`
db: elasticsearch_db
retriever_function:
type: vector_retrieval
# ... update the db in `tools`
db: elasticsearch_db
summary_retriever:
type: summary_retriever
# ... update the db in `tools`
db: elasticsearch_db
Retrieval Setup#
To change the type of retrieval used for Question-Answering, you can choose one of the following configurations:
VectorRAG (VRAG)
Basic GraphRAG (GRAG)
Chain-of-Thought Retrieval and QA (CoT)
Chain-of-Thought Retrieval with Vision Language Model (VLM)
Foundation-RAG using RAG NVIDIA blueprint (FRAG)
Advanced Graph Retrieval with Graph Traversal and VLM (AdvGRAG)
RAG Type Configuration Examples#
Once the required Database is set up, started and the environment variables are set up, the following types of RAG can be setup by modifying the config.yaml
.
The following sections show configuration snippets for different RAG types and the changes needed from the base configuration above.
Vector-RAG (VRAG)#
Vector-RAG uses vector databases for document storage and retrieval with embedding-based similarity search. During document addition, each document is stored as a Chunk into a vectorstore like Milvus or Elasticsearch. During Retrieval/QA, the relevant documents are fetched and used as context for answering the user’s question.
How it works:
Captions generated by the Vision-Language Model (VLM), along with their embeddings, are stored in Milvus DB or Elasticsearch
Embeddings can be created using any embedding NIM (by default,
nvidia/llama-3_2-nv-embedqa-1b-v2
)For a query, the top five most similar chunks are retrieved using vector similarity
Retrieved chunks are re-ranked using any reranker NIM (by default,
nvidia/llama-3_2-nv-rerankqa-1b-v2
)Re-ranked chunks are passed to a Large Language Model (LLM) NIM to generate the final answer
Full setup and example as described at Setup. A full config file can be found at data/configs/vrag.yaml
Key Changes from Base Config:
tools:
# ... existing tools ...
vector_db:
type: milvus
params:
host: !ENV ${MILVUS_DB_HOST}
port: !ENV ${MILVUS_DB_GRPC_PORT}
tools:
embedding: nvidia_embedding
nvidia_reranker:
type: reranker
params:
model: nvidia/llama-3.2-nv-rerankqa-1b-v2
base_url: "https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking"
api_key: !ENV ${NVIDIA_API_KEY}
functions:
# ... existing functions ...
summarization:
type: batch_summarization
# ... update the db in tools
tools:
llm: nvidia_llm
db: vector_db
ingestion_function:
type: vector_ingestion
# ... update the db in tools
db: vector_db
retriever_function:
type: vector_retrieval
# ... update the db in tools
tools:
llm: nvidia_llm
db: vector_db
reranker: nvidia_reranker
# ... rest of config
Basic GraphRAG (GRAG)#
GraphRAG uses graph databases (Neo4j/Arango) to store and retrieve documents with entity-relationship graphs. During document addition, nodes and relationships are created. After documents are added, document ingestion is called that finalizes the graph by creating community summaries and making it available for Question-Answering during retrieval. During Retrieval, the relevant nodes/entities and graph community summarizes are used as context to answer user’s query.
How it works:
Graph Extraction: Entities and relationships are extracted from VLM captions using an LLM and stored in a GraphDB
Captions and embeddings (generated with any embedding NIM) are linked to these entities
Graph Retrieval: For a given query, relevant entities, relationships, and captions are retrieved from the GraphDB
Retrieved information is passed to an LLM NIM to generate the final answer
Full setup and example as described at Setup.
The DB can be one of ArangoDB or Neo4j. Refer to Configuration for more details.
A full config file can be found at data/configs/grag.yaml
Key Changes from Base Config:
tools:
# ... existing tools ...
graph_db:
type: neo4j
params:
host: !ENV ${GRAPH_DB_HOST}
port: !ENV ${GRAPH_DB_BOLT_PORT}
username: !ENV ${GRAPH_DB_USERNAME}
password: !ENV ${GRAPH_DB_PASSWORD}
tools:
embedding: nvidia_embedding
functions:
# ... existing functions ...
summarization:
type: batch_summarization
# ... update the db in tools
db: graph_db
ingestion_function:
type: graph_ingestion
# ... update the db in tools
db: graph_db
retriever_function:
type: graph_retrieval
# ... update the db in tools
db: graph_db
# ... rest of config
Chain-of-Thought Retrieval and QA (CoT)#
COT adds Chain-of-Thought reasoning capabilities to graph-based retrieval. This enables the LLM to perform multi-step querying on the graph to collect enough information to answer user query for multi-hop reasoning type queries.
Key Features:
Iterative Retrieval: Performs multiple retrieval iterations (up to
max_iterations
) until a confident answer is foundConfidence Scoring: Uses a confidence threshold to determine answer quality (default: 0.7)
Question Reformulation: LLM can suggest updated questions to retrieve better database results
Chat History Integration: Maintains conversation context using the last 3 interactions
Visual Data Processing: Can request and analyze video frames when visual information is needed
Structured Response: Returns JSON-formatted responses with answer, confidence, and additional metadata
Retrieval Process:
Initial context retrieval based on the user question
Integration of relevant chat history from previous interactions
Iterative LLM evaluation with structured JSON response format
If confidence is below threshold:
Request additional context using reformulated questions
Process visual data if needed (when image features enabled)
Continue iteration until confident answer or max iterations reached
Return final answer with confidence score
A full config file can be found at data/configs/grag.yaml
Key Changes from Base Config:
tools:
# ... existing tools from GraphRAG ...
openai_llm:
type: llm
params:
model: gpt-4o
base_url: https://api.openai.com/v1
max_tokens: 4096
temperature: 0.5
top_p: 0.7
api_key: !ENV ${OPENAI_API_KEY}
functions:
# ... existing functions from GraphRAG ...
summarization:
type: batch_summarization
# ... update the db in tools
tools:
# ... existing tools
db: graph_db
ingestion_function:
type: graph_ingestion
# ... update the db in tools
tools:
# ... existing tools
db: graph_db
retriever_function:
type: cot_retrieval
# ... update db and vlm
tools:
# ... existing tools
db: graph_db
vlm: openai_llm
# ... rest of config
Chain-of-Thought Retrieval with Vision Language Model (VLM)#
This type of retrieval strategy uses Chain-of-Thought reasoning to iteratively search graph and documents through multi-step query. It also uses VLM to enable visual understanding capabilities for image-based retrieval and analysis.
How it works:
Captions generated by the Vision-Language Model (VLM), along with their embeddings and video frame paths, are stored in different databases
Video frames are stored in MinIO object storage
Based on the user query, the most relevant chunk and related video frames are retrieved
Retrieved chunks and frames are passed to a Vision Language Model (VLM) NIM along with the query to generate the final answer
Embeddings are created using any embedding NIM (by default,
nvidia/llama-3_2-nv-embedqa-1b-v2
)
A full config file can be found at data/configs/vlm.yaml
Key Changes from Base Config:
tools:
# ... existing tools from GraphRAG ...
openai_llm:
type: llm
params:
model: gpt-4o
base_url: https://api.openai.com/v1
max_tokens: 4096
temperature: 0.5
top_p: 0.7
api_key: !ENV ${OPENAI_API_KEY}
image_fetcher:
type: image
params:
minio_host: !ENV ${MINIO_HOST}
minio_port: !ENV ${MINIO_PORT}
minio_username: !ENV ${MINIO_USERNAME}
minio_password: !ENV ${MINIO_PASSWORD}
functions:
# ... existing functions from GraphRAG ...
retriever_function:
type: vlm_retrieval
params:
top_k: 10
tools:
llm: nvidia_llm
db: graph_db
vlm: openai_llm
image_fetcher: image_fetcher
# ... rest of config
Foundation-RAG (FRAG)#
Foundation-RAG provides enhanced vector-based retrieval with Milvus based on the NVIDIA RAG blueprint. This uses NVIDIAs RAG blueprint to retrieve documents with reranking for Question Answering. During document adding, the documents can be added to the Milvus DB provided by NVIDIA RAG blueprint. During retrieval, CA-RAG can connect to the external Milvus and perform Question-Answering over NVIDIA RAG blueprint’s document collection.
A full config file can be found at data/configs/frag.yaml
Key Changes from Base Config:
tools:
# ... existing tools from VectorRAG...
vector_db:
type: milvus
params:
host: !ENV ${MILVUS_DB_HOST}
port: !ENV ${MILVUS_DB_GRPC_PORT}
tools:
embedding: nvidia_embedding
nvidia_reranker:
type: reranker
params:
model: nvidia/llama-3.2-nv-rerankqa-1b-v2
base_url: "https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking"
api_key: !ENV ${NVIDIA_API_KEY}
functions:
# ... existing functions from VectorRAG...
summarization:
type: batch_summarization
# ... update the db in tools
tools:
llm: nvidia_llm
db: vector_db
ingestion_function:
type: foundation_ingestion
# ... update the db in tools
params:
batch_size: 1
tools:
llm: nvidia_llm
db: vector_db
retriever_function:
type: foundation_retrieval
# ... update the db in tools
tools:
llm: nvidia_llm
db: vector_db
reranker: nvidia_reranker
# ... rest of config
Advanced Graph Retrieval with Graph Traversal and VLM (AdvGRAG)#
This agent combines advanced graph retrieval with Graph Traversal capabilities and vision language models.
Architecture Components:
Planning Module: Creates execution plans and evaluates results to determine next steps
Execution Engine: Parses XML-structured plans and creates tool calls
Tool Node: Executes specialized search and analysis tools
Response Formatter: Formats final answers based on all collected information
Available Traversal Strategies:
chunk_search
: Retrieves the most relevant chunks using vector similarityentity_search
: Retrieves entities and relationships using vector similaritychunk_filter
: Filters chunks based on time ranges and camera IDschunk_reader
: Analyzes chunks and video frames using VLM for detailed insightsbfs
: Performs breadth-first search through entity relationshipsnext_chunk
: Retrieves chronologically adjacent chunks
Iterative Process:
Planning Phase: Planning module creates initial execution plan
Execution Phase: Execution engine parses plan and calls appropriate tools
Evaluation Phase: Results are evaluated; if incomplete, cycle repeats with refined plan
Response Phase: Final answer is generated when sufficient information is gathered
Key Features:
Multi-Channel Support: Handles multiple camera streams with runtime camera information
Dynamic Tool Selection: Uses only the tools specified in configuration
Iterative Refinement: Continues until confident answer or max iterations reached
XML-Structured Plans: Uses structured XML format for reliable plan parsing
Context Awareness: Integrates video length and camera metadata into planning
A full config file can be found at data/configs/planner_vlm.yaml
Key Changes from Base Config:
tools:
# ... existing tools from GraphRAG ...
openai_llm:
type: llm
params:
model: gpt-4o
base_url: https://api.openai.com/v1
max_tokens: 4096
temperature: 0.5
top_p: 0.7
api_key: !ENV ${OPENAI_API_KEY}
image_fetcher:
type: image
params:
minio_host: !ENV ${MINIO_HOST}
minio_port: !ENV ${MINIO_PORT}
minio_username: !ENV ${MINIO_USERNAME}
minio_password: !ENV ${MINIO_PASSWORD}
functions:
# ... existing functions from GraphRAG ...
retriever_function:
type: adv_graph_retrieval
params:
tools: ["chunk_search", "chunk_filter", "entity_search", "chunk_reader"]
top_k: 10
tools:
llm: nvidia_llm
db: graph_db
vlm: openai_llm
image_fetcher: image_fetcher
context_manager:
functions:
- summarization
- ingestion_function
- retriever_function