Listing and Searching Documents

Implementing the Method

  • Edit the RetrievalAugmentedGeneration/examples/simple_rag_api_catalog/chains.py file and add the following statements after the import statements.

    • Replace the document_search method with the following code:

          def document_search(self, content: str, num_docs: int) -> List[Dict[str, Any]]:
              """Search for the most relevant documents for the given search parameters."""
      
              try:
                  retriever = vector_store.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": settings.retriever.score_threshold, "k": settings.retriever.top_k})
                  docs = retriever.invoke(content)
      
                  result = []
                  for doc in docs:
                      result.append(
                          {
                              "source": os.path.basename(doc.metadata.get('source', '')),
                              "content": doc.page_content
                          }
                      )
                      return result
                  return []
              except Exception as e:
                  logger.error(f"Error from POST /search endpoint. Error details: {e}")
                  raise
      
    • Replace the get_documents method with the following code:

          def get_documents(self) -> List[str]:
              """Retrieve file names from the vector store."""
              extract_filename = lambda metadata : os.path.basename(metadata['source'])
              try:
                  global vector_store
      
                  in_memory_docstore = vector_store.docstore._dict
                  filenames = [extract_filename(doc.metadata) for doc in in_memory_docstore.values()]
                  filenames = list(set(filenames))
                  return filenames
              except Exception as e:
                  logger.error(f"Vector store not initialized. Error details: {e}")
              return []
      
    • Replace the delete_documents method with the following code:

          def delete_documents(self, filenames: List[str]):
              """Delete documents from the vector index."""
              extract_filename = lambda metadata : os.path.basename(metadata['source'])
              try:
                  global vector_store
      
                  in_memory_docstore = vector_store.docstore._dict
                  for filename in filenames:
                      ids_list = [doc_id for doc_id, doc_data in in_memory_docstore.items() if extract_filename(doc_data.metadata) == filename]
                      if vector_store.delete(ids_list):
                          logger.info(f"Deleted document with file name: {filename}")
                          return True
                      else:
                          logger.error(f"Failed to delete document: {filename}")
                          return False
      
              except Exception as e:
                  logger.error(f"Vector store not initialized. Error details: {e}")
                  raise
      

Building and Running with Docker Compose

Using the containers has one additional step this time: exporting your NVIDIA API key as an environment variable.

  1. Build the container for the Chain Server:

    $ docker compose --env-file deploy/compose/compose.env -f deploy/compose/simple-rag-api-catalog.yaml build chain-server
    
  2. Export your NVIDIA API key in an environment variable:

    $ export NVIDIA_API_KEY=nvapi-...
    
  3. Run the containers:

    $ docker compose --env-file deploy/compose/compose.env -f deploy/compose/simple-rag-api-catalog.yaml up -d
    

Verify the Ingest Docs Method Using Curl

You can access the Chain Server with a URL like http://localhost:8081.

  • Upload the README from the repository:

    $ curl http://localhost:8081/documents -F "file=@README.md"
    

    Example Output

    {"message":"File uploaded successfully"}
    
  • List the ingested documents:

    $ curl -X GET http://localhost:8081/documents
    

    Example Output

    {"documents":["README.md"]}
    
  • Submit a query to search the documents:

    $ curl -H "Content-Type: application/json" \
        http://localhost:8081/search \
        -d '{"query":"Does NVIDIA have sample RAG code?", "top_k":1}'
    

    Example Output

    {
      "chunks": [
        {
          "content": "NVIDIA Generative AI Examples\n\nIntroduction\n\nState-of-the-art Generative AI examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs.\n\nNVIDIA NGC\n\nGenerative AI Examples can use models and GPUs from the NVIDIA NGC: AI Development Catalog.\n\nSign up for a free NGC developer account to access:\n\nGPU-optimized containers used in these examples\n\nRelease notes and developer documentation\n\nRetrieval Augmented Generation (RAG)\n\nA RAG pipeline embeds multimodal data --  such as documents, images, and video -- into a database connected to a LLM.\nRAG lets users chat with their data!\n\nDeveloper RAG Examples\n\nThe developer RAG examples run on a single VM.\nThe examples demonstrate how to combine NVIDIA GPU acceleration with popular LLM programming frameworks using NVIDIA's open source connectors.\nThe examples are easy to deploy with Docker Compose.\n\nExamples support local and remote inference endpoints.\nIf you have a GPU, you can inference locally with TensorRT-LLM.\nIf you don't have a GPU, you can inference and embed remotely with NVIDIA API Catalog endpoints.",
          "filename": "README.md",
          "score": 0
        }
      ]
    }
    
  • Confirm that the search returns relevant documents:

    $ curl -H "Content-Type: application/json" \
        http://localhost:8081/search \
        -d '{"query":"Is vanilla ice cream better than chocolate ice cream?", "top_k":1}'
    

    Example Output

    {"chunks":[]}
    
  • Confirm the delete method works:

    $ curl -X DELETE http://localhost:8081/documents\?filename\=README.md
    

    Example Output

    {"message":"Document README.md deleted successfully"}
    

Next Steps

  • Creating an LLM Chain

  • You can stop the containers by running the docker compose -f deploy/compose/simple-rag-api-catalog.yaml down command.