Use Multimodal Embedding with NeMo Retriever Library
Text-only NeMo Retriever embedding NIM
You can still use the NeMo Retriever text embedding NIM (OpenAI-compatible embeddings for passage and query vectors) alongside or instead of the multimodal flows on this page. Product and deployment details are in the NeMo Retriever Text Embedding NIM documentation. In library and CLI pipelines, route embedding to that NIM with your configured embed / invoke URL and model name (see the graph pipeline examples for environment-based remote inference).
This documentation describes how to use NeMo Retriever Library with the multimodal embedding model Llama Nemotron Embed VL 1B v2.
The Llama Nemotron Embed VL 1B v2 model is optimized for multimodal question-answering retrieval. The model can embed documents in the form of an image, text, or a combination of image and text. Documents can then be retrieved given a user query in text form. The model supports images that contain text, tables, charts, and infographics.
Example with Default Text-Based Embedding
When you use the multimodal model, by default, all extracted content (text, tables, charts) is treated as plain text. The following example provides a strong baseline for retrieval.
- The
embedmethod is called with no arguments.
ingestor = (
Ingestor()
.files("./data/*.pdf")
.extract()
.embed() # Default behavior embeds all content as text
)
results = ingestor.ingest()
Example with Embedding Structured Elements as Text + Images
It is common to process PDFs by embedding standard text as text, and embed visual elements like tables and charts as images. The following example enables the multimodal model to capture the spatial and structural information of the visual content.
- The
embedmethod is configured withembed_modality="text_image"to embed the extracted tables and charts as images. - This configuration is more accurate than text only with a performance cost
ingestor = (
Ingestor()
.files("./data/*.pdf")
.extract()
.embed(
embed_modality="text_image",
)
)
results = ingestor.ingest()
Example with Embedding Entire PDF Pages as Images
For documents where the entire page layout is important (such as infographics, complex diagrams, or forms), you can configure NeMo Retriever Library to treat every page as a single image. The following example extracts and embeds each page as an image.
- The
embed methodprocesses the page images.
ingestor = (
Ingestor()
.files("./data/*.pdf")
.extract()
.embed(
embed_modality="image",
embed_granularity="page"
)
)
results = ingestor.ingest()