Skip to content

Quickstart: retriever CLI

Use retriever ingest and retriever query for product-facing workflows. retriever pipeline is development / compatibility only; see Supported vs development / experimental subcommands.

Quick start

For deployment of NeMo Retriever / NIM containers, use nemo_retriever/helm and the NeMo Retriever Library Helm install guides.

Ingest a PDF

retriever ingest ./data/multimodal_test.pdf \
  --method pdfium \
  --extract-text --extract-tables --extract-charts \
  --use-table-structure \
  --embed-model-name nvidia/llama-nemotron-embed-1b-v2

Then query the LanceDB table:

retriever query "What is in this document?" \
  --embed-model-name nvidia/llama-nemotron-embed-1b-v2

Development-only pipeline features such as --save-intermediate, runtime summaries, and post-ingest evaluation remain on retriever pipeline run while the public path is restricted to ingest/query.

Route stages to self-hosted or hosted NIM endpoints by passing only the URLs you want to override:

export NVIDIA_API_KEY=nvapi-...

retriever ingest ./data/multimodal_test.pdf \
  --page-elements-invoke-url https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-page-elements-v3 \
  --ocr-invoke-url https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-ocr-v1 \
  --table-structure-invoke-url https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-table-structure-v1 \
  --embed-invoke-url https://integrate.api.nvidia.com/v1/embeddings \
  --embed-model-name nvidia/llama-nemotron-embed-1b-v2

retriever query "What is in this document?" \
  --embed-invoke-url https://integrate.api.nvidia.com/v1/embeddings \
  --embed-model-name nvidia/llama-nemotron-embed-1b-v2 \
  --reranker-invoke-url https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-nemotron-rerank-vl-1b-v2/reranking

Query result controls

retriever query returns compact JSON hits with source, page_number, and text. By default it retrieves and returns --top-k rows. Use these controls when you need a wider candidate pool or a narrower result shape:

# Retrieve 30 candidates, then return the best 10.
retriever query "where is the warranty limitation discussed?" \
  --candidate-k 30

# Keep only the first hit from each document page.
retriever query "which pages discuss operating costs?" \
  --top-k 5 \
  --candidate-k 30 \
  --page-dedup

# Search a wider pool, then keep only table rows.
retriever query "annual revenue by region" \
  --top-k 5 \
  --candidate-k 40 \
  --content-types table

--top-k is the final number of hits returned. --candidate-k is the wider candidate pool retrieved before page deduplication, content-type filtering, and final truncation. It must be greater than or equal to --top-k, and should usually be larger when page deduplication or content-type filtering might otherwise remove too many of the top retrieved rows. Page deduplication and content-type filtering are applied after vector retrieval, preserving the retriever's ranking order and truncating the final output to --top-k. When querying a table ingested with an explicit embedding model, pass the same --embed-model-name to retriever query. --content-types accepts comma-separated content types such as text, table, chart, image, and infographic. images is accepted as an alias for captioned image rows emitted by ingest. Hits with missing or unknown content types are excluded while --content-types is active.

NVIDIA_API_KEY is required only when those URLs point at hosted build.nvidia.com endpoints. NGC_API_KEY is used separately when pulling or running self-hosted NIM containers.

What you get

  • Extracted text, tables, and charts as rows in LanceDB at ./lancedb (default table name nemo-retriever).
  • Compact JSON retrieval hits from retriever query, including source, page, and text fields.
  • Extracted image assets when retriever ingest is run with --store-images-uri.
  • Pipeline-only development artifacts such as extraction Parquet, runtime summaries, and evaluation reports remain available through retriever pipeline run.
  • Progress and stage logs on stderr.

Inspect the results

ls ./lancedb
import lancedb

db = lancedb.connect("./lancedb")
tbl = db.open_table("nemo-retriever")
print(tbl.to_pandas().head())

Or query via the Retriever Python client (nemo_retriever/README.md):

from nemo_retriever.retriever import Retriever

retriever = Retriever(
    vdb_kwargs={"uri": "lancedb", "table_name": "nemo-retriever"},
    embed_kwargs={
        "model_name": "nvidia/llama-nemotron-embed-1b-v2",
        "embed_model_name": "nvidia/llama-nemotron-embed-1b-v2",
    },
    top_k=5,
)
hits = retriever.query(
    "Given their activities, which animal is responsible for the typos?"
)

Larger datasets

  • Omitting --run-mode defaults to inprocess (single-process pandas; no Ray startup).
  • Ray Data scale-out: retriever ingest ./data/pdf_corpus --run-mode batch.
  • Tune throughput with --pdf-extract-workers, --pdf-extract-batch-size, --page-elements-workers, --page-elements-batch-size, --ocr-workers, --ocr-batch-size, --embed-workers, and --embed-batch-size.