NAT Function/Tool#

The Context Aware RAG NAT plugin can also be used as a function/tool in custom NAT workflows.

In ./src/vss_ctx_rag/nat/nat_config/function/ there are two example config files for using Context Aware RAG as a function/tool for ingestion and retrieval.

Retrieval Function#

This is an example of the config file for using Context Aware RAG as a function/tool for retrieval:

# config-retrieval-function.yml
general:
  use_uvloop: true

llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct
    max_tokens: 2048
    base_url: "https://integrate.api.nvidia.com/v1"


embedders:
  embedding_llm:
    _type: nim
    model_name: nvidia/llama-3.2-nv-embedqa-1b-v2
    truncate: "END"
    base_url: "https://integrate.api.nvidia.com/v1"


functions:
  retrieval_function:
    _type: vss_ctx_rag_retrieval
    llm_name: nim_llm
    retrieval_type: "graph_retrieval"

    db_type: "neo4j"
    db_host: "localhost"
    db_port: "7687"
    db_user: "neo4j"
    db_password: "passneo4j"

    embedding_model_name: embedding_llm

    uuid: "123456"


workflow:
  _type: react_agent
  tool_names: [retrieval_function]
  llm_name: nim_llm
  verbose: true
  retry_parsing_errors: true
  max_retries: 3

Here vss_ctx_rag_retrieval function is added as a tool to Langchain react agent. The react agent is a agent that uses a language model to decide which tool to use based on the user’s query. In this example, the react agent will use the vss_ctx_rag_retrieval function to retrieve information from the vector database.

Ingestion Function#

This is an example of the config file for using Context Aware RAG as a function/tool for ingestion:

# config-ingestion-function.yml
general:
  use_uvloop: true

llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct
    max_tokens: 2048
    base_url: "https://integrate.api.nvidia.com/v1"


embedders:
  embedding_llm:
    _type: nim
    model_name: nvidia/llama-3.2-nv-embedqa-1b-v2
    truncate: "END"
    base_url: "https://integrate.api.nvidia.com/v1"


functions:
  ingestion_function:
    _type: vss_ctx_rag_ingestion
    llm_name: nim_llm
    ingestion_type: "graph_ingestion"

    db_type: "neo4j"
    db_host: "localhost"
    db_port: "7687"
    db_user: "neo4j"
    db_password: "passneo4j"

    embedding_model_name: embedding_llm

    uuid: "123456"


workflow:
  _type: tool_call_workflow
  tool_names: [ingestion_function]
  llm_name: nim_llm

A custom tool call workflow is defined that will use the Context Aware RAG ingestion function to ingest documents into the vector database. This is so the input passed in will be treated as a document and not a query.

Running the function#

Exporting environment variables#

Ensure you have the correct connection settings for the vector and/or graph databases in the config file.

Refer to the Setup Guide for more details.

Running Data Ingestion#

nat serve --config_file=./packages/vss_ctx_rag_nat/src/vss_ctx_rag/plugins/nat/nat_config/function/config-ingestion-function.yml --port <PORT>

Running Graph Retrieval#

nat serve --config_file=./packages/vss_ctx_rag_nat/src/vss_ctx_rag/plugins/nat/nat_config/function/config-retrieval-function.yml --port <PORT>

Example Python API calls to the services#

Here there are two services running, one for ingestion on port 8000 and one for retrieval on port 8001.

Ingestion Python request#

import requests

url = "http://localhost:8000/generate"
headers = {"Content-Type": "application/json"}
data = {
    "rag_request": "The bridge is bright blue."
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

Retrieval Python request#

import requests

url = "http://localhost:8001/generate"
headers = {"Content-Type": "application/json"}
data = {
    "input_message": "Is there a bridge? If so describe it"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())