Quickstart: retriever CLI
Use
retriever ingestandretriever queryfor product-facing workflows.retriever pipelineis development / compatibility only; see Supported vs development / experimental subcommands.
Quick start
For deployment of NeMo Retriever / NIM containers, use nemo_retriever/helm and the NeMo Retriever Library Helm install guides.
Ingest a PDF
retriever ingest ./data/multimodal_test.pdf \
--method pdfium \
--extract-text --extract-tables --extract-charts \
--use-table-structure \
--embed-model-name nvidia/llama-nemotron-embed-1b-v2
Then query the LanceDB table:
retriever query "What is in this document?" \
--embed-model-name nvidia/llama-nemotron-embed-1b-v2
Development-only pipeline features such as --save-intermediate, runtime
summaries, and post-ingest evaluation remain on retriever pipeline run while
the public path is restricted to ingest/query.
Route stages to self-hosted or hosted NIM endpoints by passing only the URLs you want to override:
export NVIDIA_API_KEY=nvapi-...
retriever ingest ./data/multimodal_test.pdf \
--page-elements-invoke-url https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-page-elements-v3 \
--ocr-invoke-url https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-ocr-v1 \
--table-structure-invoke-url https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-table-structure-v1 \
--embed-invoke-url https://integrate.api.nvidia.com/v1/embeddings \
--embed-model-name nvidia/llama-nemotron-embed-1b-v2
retriever query "What is in this document?" \
--embed-invoke-url https://integrate.api.nvidia.com/v1/embeddings \
--embed-model-name nvidia/llama-nemotron-embed-1b-v2 \
--reranker-invoke-url https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-nemotron-rerank-vl-1b-v2/reranking
Query result controls
retriever query returns compact JSON hits with source, page_number, and text.
By default it retrieves and returns --top-k rows. Use these controls when you
need a wider candidate pool or a narrower result shape:
# Retrieve 30 candidates, then return the best 10.
retriever query "where is the warranty limitation discussed?" \
--candidate-k 30
# Keep only the first hit from each document page.
retriever query "which pages discuss operating costs?" \
--top-k 5 \
--candidate-k 30 \
--page-dedup
# Search a wider pool, then keep only table rows.
retriever query "annual revenue by region" \
--top-k 5 \
--candidate-k 40 \
--content-types table
--top-k is the final number of hits returned. --candidate-k is the wider
candidate pool retrieved before page deduplication, content-type filtering, and
final truncation. It must be greater than or equal to --top-k, and should
usually be larger when page deduplication or content-type filtering might
otherwise remove too many of the top retrieved rows. Page deduplication and
content-type filtering are applied after vector retrieval, preserving the
retriever's ranking order and truncating the final output to --top-k.
When querying a table ingested with an explicit embedding model, pass the same
--embed-model-name to retriever query.
--content-types accepts comma-separated content types such as text, table,
chart, image, and infographic. images is accepted as an alias for
captioned image rows emitted by ingest. Hits with missing or unknown content
types are excluded while --content-types is active.
NVIDIA_API_KEY is required only when those URLs point at hosted
build.nvidia.com endpoints. NGC_API_KEY is used separately when pulling or
running self-hosted NIM containers.
What you get
- Extracted text, tables, and charts as rows in LanceDB at
./lancedb(default table namenemo-retriever). - Compact JSON retrieval hits from
retriever query, including source, page, and text fields. - Extracted image assets when
retriever ingestis run with--store-images-uri. - Pipeline-only development artifacts such as extraction Parquet, runtime
summaries, and evaluation reports remain available through
retriever pipeline run. - Progress and stage logs on stderr.
Inspect the results
ls ./lancedb
import lancedb
db = lancedb.connect("./lancedb")
tbl = db.open_table("nemo-retriever")
print(tbl.to_pandas().head())
Or query via the Retriever Python client (nemo_retriever/README.md):
from nemo_retriever.retriever import Retriever
retriever = Retriever(
vdb_kwargs={"uri": "lancedb", "table_name": "nemo-retriever"},
embed_kwargs={
"model_name": "nvidia/llama-nemotron-embed-1b-v2",
"embed_model_name": "nvidia/llama-nemotron-embed-1b-v2",
},
top_k=5,
)
hits = retriever.query(
"Given their activities, which animal is responsible for the typos?"
)
Larger datasets
- Omitting
--run-modedefaults toinprocess(single-process pandas; no Ray startup). - Ray Data scale-out:
retriever ingest ./data/pdf_corpus --run-mode batch. - Tune throughput with
--pdf-extract-workers,--pdf-extract-batch-size,--page-elements-workers,--page-elements-batch-size,--ocr-workers,--ocr-batch-size,--embed-workers, and--embed-batch-size.