Listing and Searching Documents

Implementing the Method
Building and Running with Docker Compose
Verify the Ingest Docs Method Using Curl
Next Steps

Implementing the Method

Edit the RetrievalAugmentedGeneration/examples/simple_rag_api_catalog/chains.py file and add the following statements after the import statements.

Replace the document_search method with the following code:

    def document_search(self, content: str, num_docs: int) -> List[Dict[str, Any]]:
        """Search for the most relevant documents for the given search parameters."""

        try:
            retriever = vector_store.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": settings.retriever.score_threshold, "k": settings.retriever.top_k})
            docs = retriever.invoke(content)

            result = []
            for doc in docs:
                result.append(
                    {
                        "source": os.path.basename(doc.metadata.get('source', '')),
                        "content": doc.page_content
                    }
                )
                return result
            return []
        except Exception as e:
            logger.error(f"Error from POST /search endpoint. Error details: {e}")
            raise

Replace the get_documents method with the following code:

    def get_documents(self) -> List[str]:
        """Retrieve file names from the vector store."""
        extract_filename = lambda metadata : os.path.basename(metadata['source'])
        try:
            global vector_store

            in_memory_docstore = vector_store.docstore._dict
            filenames = [extract_filename(doc.metadata) for doc in in_memory_docstore.values()]
            filenames = list(set(filenames))
            return filenames
        except Exception as e:
            logger.error(f"Vector store not initialized. Error details: {e}")
        return []

Replace the delete_documents method with the following code:

    def delete_documents(self, filenames: List[str]):
        """Delete documents from the vector index."""
        extract_filename = lambda metadata : os.path.basename(metadata['source'])
        try:
            global vector_store

            in_memory_docstore = vector_store.docstore._dict
            for filename in filenames:
                ids_list = [doc_id for doc_id, doc_data in in_memory_docstore.items() if extract_filename(doc_data.metadata) == filename]
                if vector_store.delete(ids_list):
                    logger.info(f"Deleted document with file name: {filename}")
                    return True
                else:
                    logger.error(f"Failed to delete document: {filename}")
                    return False

        except Exception as e:
            logger.error(f"Vector store not initialized. Error details: {e}")
            raise

Building and Running with Docker Compose

Using the containers has one additional step this time: exporting your NVIDIA API key as an environment variable.

Build the container for the Chain Server:

$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/simple-rag-api-catalog.yaml build chain-server

Export your NVIDIA API key in an environment variable:
```
$ export NVIDIA_API_KEY=nvapi-...
```

Run the containers:

$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/simple-rag-api-catalog.yaml up -d

Verify the Ingest Docs Method Using Curl

You can access the Chain Server with a URL like http://localhost:8081.

Upload the README from the repository:

$ curl http://localhost:8081/documents -F "file=@README.md"

Example Output

{"message":"File uploaded successfully"}

List the ingested documents:

$ curl -X GET http://localhost:8081/documents

Example Output

{"documents":["README.md"]}

Submit a query to search the documents:

$ curl -H "Content-Type: application/json" \
    http://localhost:8081/search \
    -d '{"query":"Does NVIDIA have sample RAG code?", "top_k":1}'

Example Output

{
  "chunks": [
    {
      "content": "NVIDIA Generative AI Examples\n\nIntroduction\n\nState-of-the-art Generative AI examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs.\n\nNVIDIA NGC\n\nGenerative AI Examples can use models and GPUs from the NVIDIA NGC: AI Development Catalog.\n\nSign up for a free NGC developer account to access:\n\nGPU-optimized containers used in these examples\n\nRelease notes and developer documentation\n\nRetrieval Augmented Generation (RAG)\n\nA RAG pipeline embeds multimodal data --  such as documents, images, and video -- into a database connected to a LLM.\nRAG lets users chat with their data!\n\nDeveloper RAG Examples\n\nThe developer RAG examples run on a single VM.\nThe examples demonstrate how to combine NVIDIA GPU acceleration with popular LLM programming frameworks using NVIDIA's open source connectors.\nThe examples are easy to deploy with Docker Compose.\n\nExamples support local and remote inference endpoints.\nIf you have a GPU, you can inference locally with TensorRT-LLM.\nIf you don't have a GPU, you can inference and embed remotely with NVIDIA API Catalog endpoints.",
      "filename": "README.md",
      "score": 0
    }
  ]
}

Confirm that the search returns relevant documents:

$ curl -H "Content-Type: application/json" \
    http://localhost:8081/search \
    -d '{"query":"Is vanilla ice cream better than chocolate ice cream?", "top_k":1}'

Example Output

{"chunks":[]}

Confirm the delete method works:

$ curl -X DELETE http://localhost:8081/documents\?filename\=README.md

Example Output

{"message":"Document README.md deleted successfully"}

Next Steps

Creating an LLM Chain
You can stop the containers by running the docker compose -f deploy/compose/simple-rag-api-catalog.yaml down command.