Creating a RAG Chain
Implementing the Method
The purpose of the rag_chain
method is to retrieve document chunks from the vector store that are closely related to the query.
The chunks are provided to the LLM to augment the query and then generate the response.
Edit the
RetrievalAugmentedGeneration/examples/simple_rag_api_catalog/chains.py
file and add the followingimport
statements:from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate from RetrievalAugmentedGeneration.common.utils import get_llm, get_config
Update the
rag_chain
method with the following statements:def rag_chain(self, query: str, chat_history: List["Message"], **kwargs) -> Generator[str, None, None]: """Code to fetch context and form an answer using LLM""" logger.info("Using rag to generate response from document") settings = get_config() system_message = [("system", settings.prompts.rag_template)] conversation_history = [(msg.role, msg.content) for msg in chat_history] user_input = [("user", "{input}")] if conversation_history: prompt_template = ChatPromptTemplate.from_messages( system_message + conversation_history + user_input ) else: prompt_template = ChatPromptTemplate.from_messages( system_message + user_input ) llm = get_llm(**kwargs) chain = prompt_template | llm | StrOutputParser() try: retriever = vector_store.as_retriever() docs = retriever.get_relevant_documents(query) context = "" for doc in docs: context += doc.page_content + "\n\n" augmented_user_input = ( "Context: " + context + "\n\nQuestion: " + query + "\n" ) return chain.stream({"input": augmented_user_input}) except Exception as e: logger.warning(f"Failed to generate response: {e}")
Building and Running with Docker Compose
Using the containers has one additional step this time: exporting your NVIDIA API key as an environment variable.
Build the container for the Chain Server:
$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/simple-rag-api-catalog.yaml build chain-server
Export your NVIDIA API key in an environment variable:
$ export NVIDIA_API_KEY=nvapi-...
Run the containers:
$ docker compose --env-file deploy/compose/compose.env -f deploy/compose/simple-rag-api-catalog.yaml up -d
Verify the RAG Chain Method Using Curl
You can access the Chain Server with a URL like http://localhost:8081.
Upload a sample document, such as the README from the repository:
$ curl http://localhost:8081/documents -F "file=@README.md"
Confirm the
rag_chain
method runs by submitting a query:$ curl -H "Content-Type: application/json" http://localhost:8081/generate \ -d '{"messages":[{"role":"user", "content":"how many models are used in generative AI examples from NVIDIA?"}], "use_knowledge_base": true}'
Example Output
data: {"id":"0fbc961e-34b6-44e9-a996-9d2f84e794c9","choices":[{"index":0,"message":{"role":"assistant","content":""},"finish_reason":""}]} data: {"id":"0fbc961e-34b6-44e9-a996-9d2f84e794c9","choices":[{"index":0,"message":{"role":"assistant","content":" The"},"finish_reason":""}]} data: {"id":"0fbc961e-34b6-44e9-a996-9d2f84e794c9","choices":[{"index":0,"message":{"role":"assistant","content":" text provided mentions several models used in the generative AI examples from NVIDIA, including:\n\n1. Gemma\n2. LoRA\n"},"finish_reason":""}]} data: {"id":"0fbc961e-34b6-44e9-a996-9d2f84e794c9","choices":[{"index":0,"message":{"role":"assistant","content":"3. SFT (not specified what it stands for)\n4. Starcoder-2\n5. Small language models (SLMs)\n\n"},"finish_reason":""}]} data: {"id":"0fbc961e-34b6-44e9-a996-9d2f84e794c9","choices":[{"index":0,"message":{"role":"assistant","content":"However, it's unclear whether all of these models are used in every example or just some of them. The specific number of models used in each example"},"finish_reason":""}]} data: {"id":"0fbc961e-34b6-44e9-a996-9d2f84e794c9","choices":[{"index":0,"message":{"role":"assistant","content":" is not provided."},"finish_reason":""}]} data: {"id":"0fbc961e-34b6-44e9-a996-9d2f84e794c9","choices":[{"index":0,"message":{"role":"assistant","content":""},"finish_reason":""}]} data: {"id":"0fbc961e-34b6-44e9-a996-9d2f84e794c9","choices":[{"index":0,"message":{"role":"assistant","content":""},"finish_reason":"[DONE]"}]}
Next Steps
You can stop the containers by running the
docker compose -f deploy/compose/simple-rag-api-catalog.yaml down
command.