Listing and Searching Documents
Implementing the Method
Edit the
RetrievalAugmentedGeneration/examples/simple_rag_api_catalog/chains.py
file and add the following statements after theimport
statements.Replace the
document_search
method with the following code:def document_search(self, content: str, num_docs: int) -> List[Dict[str, Any]]: """Search for the most relevant documents for the given search parameters.""" try: retriever = vector_store.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": settings.retriever.score_threshold, "k": settings.retriever.top_k}) docs = retriever.invoke(content) result = [] for doc in docs: result.append( { "source": os.path.basename(doc.metadata.get('source', '')), "content": doc.page_content } ) return result return [] except Exception as e: logger.error(f"Error from POST /search endpoint. Error details: {e}") raise
Replace the
get_documents
method with the following code:def get_documents(self) -> List[str]: """Retrieve file names from the vector store.""" extract_filename = lambda metadata : os.path.basename(metadata['source']) try: global vector_store in_memory_docstore = vector_store.docstore._dict filenames = [extract_filename(doc.metadata) for doc in in_memory_docstore.values()] filenames = list(set(filenames)) return filenames except Exception as e: logger.error(f"Vector store not initialized. Error details: {e}") return []
Replace the
delete_documents
method with the following code:def delete_documents(self, filenames: List[str]): """Delete documents from the vector index.""" extract_filename = lambda metadata : os.path.basename(metadata['source']) try: global vector_store in_memory_docstore = vector_store.docstore._dict for filename in filenames: ids_list = [doc_id for doc_id, doc_data in in_memory_docstore.items() if extract_filename(doc_data.metadata) == filename] if vector_store.delete(ids_list): logger.info(f"Deleted document with file name: {filename}") return True else: logger.error(f"Failed to delete document: {filename}") return False except Exception as e: logger.error(f"Vector store not initialized. Error details: {e}") raise
Building and Running with Docker Compose
Using the containers has one additional step this time: exporting your NVIDIA API key as an environment variable.
Build the container for the Chain Server:
$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/simple-rag-api-catalog.yaml build chain-server
Export your NVIDIA API key in an environment variable:
$ export NVIDIA_API_KEY=nvapi-...
Run the containers:
$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/simple-rag-api-catalog.yaml up -d
Verify the Ingest Docs Method Using Curl
You can access the Chain Server with a URL like http://localhost:8081.
Upload the README from the repository:
$ curl http://localhost:8081/documents -F "file=@README.md"
Example Output
{"message":"File uploaded successfully"}
List the ingested documents:
$ curl -X GET http://localhost:8081/documents
Example Output
{"documents":["README.md"]}
Submit a query to search the documents:
$ curl -H "Content-Type: application/json" \ http://localhost:8081/search \ -d '{"query":"Does NVIDIA have sample RAG code?", "top_k":1}'
Example Output
{ "chunks": [ { "content": "NVIDIA Generative AI Examples\n\nIntroduction\n\nState-of-the-art Generative AI examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs.\n\nNVIDIA NGC\n\nGenerative AI Examples can use models and GPUs from the NVIDIA NGC: AI Development Catalog.\n\nSign up for a free NGC developer account to access:\n\nGPU-optimized containers used in these examples\n\nRelease notes and developer documentation\n\nRetrieval Augmented Generation (RAG)\n\nA RAG pipeline embeds multimodal data -- such as documents, images, and video -- into a database connected to a LLM.\nRAG lets users chat with their data!\n\nDeveloper RAG Examples\n\nThe developer RAG examples run on a single VM.\nThe examples demonstrate how to combine NVIDIA GPU acceleration with popular LLM programming frameworks using NVIDIA's open source connectors.\nThe examples are easy to deploy with Docker Compose.\n\nExamples support local and remote inference endpoints.\nIf you have a GPU, you can inference locally with TensorRT-LLM.\nIf you don't have a GPU, you can inference and embed remotely with NVIDIA API Catalog endpoints.", "filename": "README.md", "score": 0 } ] }
Confirm that the search returns relevant documents:
$ curl -H "Content-Type: application/json" \ http://localhost:8081/search \ -d '{"query":"Is vanilla ice cream better than chocolate ice cream?", "top_k":1}'
Example Output
{"chunks":[]}
Confirm the delete method works:
$ curl -X DELETE http://localhost:8081/documents\?filename\=README.md
Example Output
{"message":"Document README.md deleted successfully"}
Next Steps
You can stop the containers by running the
docker compose -f deploy/compose/simple-rag-api-catalog.yaml down
command.