Setting up the Simple Example

Creating Directories and Boilerplate Code
Building and Running with Docker Compose
Verify the Chain Server Methods Using Curl
Verify the Chain Server Methods Using the RAG Playground
Next Steps

Creating Directories and Boilerplate Code

Make a directory for the example:

$ mkdir RetrievalAugmentedGeneration/examples/simple_rag_api_catalog

Create an empty __init__.py file to indicate it is a Python module:

$ touch RetrievalAugmentedGeneration/examples/simple_rag_api_catalog/__init__.py

Create a RetrievalAugmentedGeneration/examples/simple_rag_api_catalog/chains.py file with the following boilerplate code:

from typing import Generator, List, Dict, Any
import logging

from llama_index.core.base.response.schema import StreamingResponse
from RetrievalAugmentedGeneration.common.base import BaseExample

logger = logging.getLogger(__name__)

class SimpleExample(BaseExample):
    def ingest_docs(self, file_name: str, filename: str):
        """Code to ingest documents."""
        logger.info(f"Ingesting the documents")

    def llm_chain(self, query: str, chat_history: List["Message"], **kwargs) -> Generator[str, None, None]:
        """Code to form an answer using LLM when context is already supplied."""
        logger.info(f"Forming response from provided context")
        return StreamingResponse(iter(["TODO: Implement LLM call"])).response_gen

    def rag_chain(self, query: str, chat_history: List["Message"], **kwargs) -> Generator[str, None, None]:
        """Code to fetch context and form an answer using LLM."""
        logger.info(f"Forming response from document store")
        return StreamingResponse(iter(["TODO: Implement RAG chain call"])).response_gen

    def get_documents(self) -> List[str]:
        """Retrieve file names from the vector store."""
        logger.info("Getting document file names from the vector store")
        return []

    def delete_documents(self, filenames: List[str]) -> None:
        """Delete documents from the vector index."""
        logger.info("Deleting documents from the vector index")

    def document_search(self, content: str, num_docs: int) -> List[Dict[str, Any]]:  ## Optional method
        """Search for the most relevant documents for the given search parameters."""
        logger.info("Searching for documents based on the query")
        return []

Building and Running with Docker Compose

Like the examples provided by NVIDIA, the simple example uses Docker Compose to build and run the example.

Create a deploy/compose/simple-rag-api-catalog.yaml file with the following content.

The EXAMPLE_NAME field identifies the directory, relative to RetrievalAugmentedGeneration/examples to build for the chain server.

services:
  chain-server:
    container_name: chain-server
    image: chain-server:latest
    build:
      context: ../../
      dockerfile: ./RetrievalAugmentedGeneration/Dockerfile
      args:
        EXAMPLE_NAME: simple_rag_api_catalog
    command: --port 8081 --host 0.0.0.0
    environment:
      APP_LLM_MODELNAME: ai-mixtral-8x7b-instruct
      APP_LLM_MODELENGINE: nvidia-ai-endpoints
      APP_EMBEDDINGS_MODELNAME: snowflake/arctic-embed-l
      APP_EMBEDDINGS_MODELENGINE: nvidia-ai-endpoints
      APP_TEXTSPLITTER_CHUNKSIZE: 1200
      APP_TEXTSPLITTER_CHUNKOVERLAP: 200
      NVIDIA_API_KEY: ${NVIDIA_API_KEY}
      APP_PROMPTS_CHATTEMPLATE: "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Please ensure that your responses are positive in nature."
      APP_PROMPTS_RAGTEMPLATE: "You are a helpful AI assistant named Envie. You will reply to questions only based on the context that you are provided. If something is out of context, you will refrain from replying and politely decline to respond to the user."
      APP_RETRIEVER_TOPK: 4
      APP_RETRIEVER_SCORETHRESHOLD: 0.25
    ports:
    - "8081:8081"
    expose:
    - "8081"
    shm_size: 5gb
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  rag-playground:
    container_name: rag-playground
    image: rag-playground:latest
    build:
      context: ../.././RetrievalAugmentedGeneration/frontend/
      dockerfile: Dockerfile
    command: --port 8090
    environment:
      APP_SERVERURL: http://chain-server
      APP_SERVERPORT: 8081
      APP_MODELNAME: ai-mixtral-8x7b-instruct
      RIVA_API_URI: ${RIVA_API_URI:-}
      RIVA_API_KEY: ${RIVA_API_KEY:-}
      RIVA_FUNCTION_ID: ${RIVA_FUNCTION_ID:-}
      TTS_SAMPLE_RATE: ${TTS_SAMPLE_RATE:-48000}
    ports:
    - "8090:8090"
    expose:
    - "8090"
    depends_on:
    - chain-server

networks:
  default:
    name: nvidia-rag

Build the containers for the simple example:

$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/simple-rag-api-catalog.yaml build

Building the containers requires several minutes.

Run the containers:

$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/simple-rag-api-catalog.yaml up -d

Verify the Chain Server Methods Using Curl

You can access the Chain Server with a URL like http://localhost:8081.

Confirm the llm_chain method runs and returns the TODO response by running a command like the following:

$ curl -H "Content-Type: application/json" http://localhost:8081/generate \
    -d '{"messages":[{"role":"user", "content":"What should I see in Paris?"}], "use_knowledge_base": false}'

The response shows the TODO message:

data: {"id":"57d16654-6cc0-474a-832b-7563bf4eb1ce","choices":[{"index":0,"message":{"role":"assistant","content":"TODO: Implement LLM call"},"finish_reason":""}]}

data: {"id":"57d16654-6cc0-474a-832b-7563bf4eb1ce","choices":[{"index":0,"message":{"role":"assistant","content":""},"finish_reason":"[DONE]"}]}

Confirm the rag_chain method runs by setting use_knowledge_base to true:

$ curl -H "Content-Type: application/json" http://localhost:8081/generate \
    -d '{"messages":[{"role":"user", "content":"What should I see in Paris?"}], "use_knowledge_base": true}'

The response also shows the TODO message:

data: {"id":"dc99aaf0-b6ab-4332-bbf2-4b53fc72f963","choices":[{"index":0,"message":{"role":"assistant","content":"TODO: Implement RAG chain call"},"finish_reason":""}]}

data: {"id":"dc99aaf0-b6ab-4332-bbf2-4b53fc72f963","choices":[{"index":0,"message":{"role":"assistant","content":""},"finish_reason":"[DONE]"}]}

Confirm the ingest_docs method runs by uploading a sample document, such as the README from the repository:
```
$ curl http://localhost:8081/documents -F "file=@README.md"
```
Example Output
```
{"message":"File uploaded successfully"}
```
View the logs for the Chain Server to see the logged message from the method:
```
$ docker logs chain-server
```
The logs show the message from the code, but no ingest process:
```
INFO:example:Ingesting the documents
```

Confirm the get_documents and delete_documents methods run:

$ curl -X GET http://localhost:8081/documents 

$ curl -X DELETE http://localhost:8081/documents\?filename\=README.md 

View the logs for the Chain Server to see the logged messages from the methods:

$ docker logs chain-server

The logs show the message from the code:

INFO:example:Getting document file names from the vector store
INFO:example:Deleting documents from the vector index

Confirm the document_search method runs:

$ curl -H "Content-Type: application/json" http://localhost:8081/search \
    -d '{"query":"What should I see in Paris?", "top_k":4}'

The response is empty because the ingest_docs and document_search methods are not implemented:

{"chunks":[]}

View the logs for the Chain Server to see the logged message from the method:

$ docker logs chain-server

Example Output

INFO:example:Searching for documents based on the query

Verify the Chain Server Methods Using the RAG Playground

You can access the RAG Playground web interface with a URL like http://localhost:8090.

Confirm the llm_chain method runs by accessing http://localhost:8090/converse and entering a query such as What should I see in Paris?.
Confirm the rag_chain method runs by enabling the Use knowledge base checkbox and entering a query.
Confirm the ingest_docs method runs by accessing http://localhost:8090/kb, clicking Add File, and uploading a file.

After the upload, view the Chain Server logs by running docker logs chain-server.

Example Output
```
INFO:example:Ingesting the documents
```

Next Steps

Implementing the Ingest Docs Method