# Multimodal RAG with LangChain

This notebook shows how to perform RAG on the table, chart, and text extraction results of NV-Ingest's pdf extraction tools using LangChain

**Note:** In order to run this notebook, you'll need to have the NV-Ingest microservice running along with all of the other included microservices. To do this, make sure all of the services are uncommented in the file: [docker-compose.yaml](https://github.com/NVIDIA/nv-ingest/blob/main/docker-compose.yaml) and follow the [quickstart guide](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#quickstart) to start everything up. You'll also need to have the NV-Ingest python client installed as demonstrated [here](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#step-2-installing-python-dependencies).

To start, make sure LangChain and pymilvus are installed and up to date

In [None]:
pip install -qU langchain langchain_community langchain-nvidia-ai-endpoints>=0.3.7 langchain_milvus pymilvus

Then, we'll use NV-Ingest's Ingestor interface to extract the tables and charts from a test pdf, embed them, and upload them to our Milvus vector database (VDB)

In [1]:
from nv_ingest_client.client import Ingestor

ingestor = (
    Ingestor(message_client_hostname="localhost")
    .files("../data/multimodal_test.pdf")
    .extract(
        extract_text=False,
        extract_tables=True,
        extract_images=False,
    )
    .embed()
    .vdb_upload()
)

results = ingestor.ingest()

Now, the text, table, and chart content is extracted and stored in the Milvus VDB along with the embeddings. Next we'll connect LangChain to Milvus and create a vector store so that we can query our extraction results

In [13]:
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
from langchain_milvus import Milvus

embedding_function = NVIDIAEmbeddings(base_url="http://localhost:8012/v1")

vectorstore = Milvus(
    embedding_function=embedding_function,
    collection_name="nv_ingest_collection",
    primary_field = "pk",
    vector_field = "vector",
    text_field="text",
    connection_args={"uri": "http://localhost:19530"},
)
retriever = vectorstore.as_retriever()

Then, we'll create an RAG chain using [llama-3.1-405b-instruct](https://build.nvidia.com/meta/llama-3_1-405b-instruct) that we can use to query our pdf in natural language

In [3]:
import os 
from langchain_nvidia_ai_endpoints import ChatNVIDIA

# TODO: Add your NVIDIA API key
os.environ["NVIDIA_API_KEY"] = "[YOUR NVIDIA API KEY HERE]"

llm = ChatNVIDIA(model="meta/llama-3.1-405b-instruct")

In [17]:
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

template = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Keep the answer concise."
    "\n\n"
    "{context}"
    "Question: {question}"
)

prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

And now we can ask our pdf questions

In [16]:
rag_chain.invoke("What is the dog doing and where?")

'The dog is chasing a squirrel in the front yard.'