# Multimodal RAG with LlamaIndex

This notebook shows how to perform RAG on the table, chart, and text extraction results of nv-ingest's pdf extraction tools using LlamaIndex

**Note:** In order to run this notebook, you'll need to have the NV-Ingest microservice running along with all of the other included microservices. To do this, make sure all of the services are uncommented in the file: [docker-compose.yaml](https://github.com/NVIDIA/nv-ingest/blob/main/docker-compose.yaml) and follow the [quickstart guide](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#quickstart) to start everything up. You'll also need to have the NV-Ingest python client installed as demonstrated [here](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#step-2-installing-python-dependencies).

To start, make sure that LlamaIndex and pymilvus are installed and up to date

In [None]:
pip install -qU llama_index llama-index-embeddings-nvidia llama-index-llms-nvidia llama-index-vector-stores-milvus pymilvus

Then, we'll use NV-Ingest's Ingestor interface to extract the tables and charts from a test pdf, embed them, and upload them to our Milvus vector database (VDB)

In [5]:
from nv_ingest_client.client import Ingestor

ingestor = (
    Ingestor(message_client_hostname="localhost")
    .files("../data/multimodal_test.pdf")
    .extract(
        extract_text=False,
        extract_tables=True,
        extract_images=False,
    )
    .embed()
    .vdb_upload()
)

results = ingestor.ingest()

Now, the text, table, and chart content is extracted and stored in the Milvus VDB along with the embeddings. Next, we'll connect LlamaIndex to Milvus and create a vector store index so that we can query our extraction results

In [7]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.nvidia import NVIDIAEmbedding
from llama_index.vector_stores.milvus import MilvusVectorStore

embed_model = NVIDIAEmbedding(base_url="http://localhost:8012/v1")

vector_store = MilvusVectorStore(
    uri="http://localhost:19530",
    collection_name="nv_ingest_collection",
    doc_id_field="pk",
    embedding_field="vector",
    text_key="text",
    dim=1024,
    overwrite=False
)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, embed_model=embed_model)

Next, we'll use our vector store index to create a query engine that handles the RAG pipeline and we'll use [llama-3.1-405b-instruct](https://build.nvidia.com/meta/llama-3_1-405b-instruct) to generate the final response

In [30]:
import os
from llama_index.llms.nvidia import NVIDIA

# TODO: Add your NVIDIA API key
os.environ["NVIDIA_API_KEY"] = "[YOUR NVIDIA API KEY HERE]"

llm = NVIDIA(model="meta/llama-3.1-405b-instruct")
query_engine = index.as_query_engine(llm=llm)

And finally, we can ask it questions about our example PDF

In [9]:
query_engine.query("What is the dog doing and where?").response

'The dog is chasing a squirrel in the front yard.'