Vector databases
Use this documentation to learn how NeMo Retriever Library stores extracted embeddings and uploads data to vector databases.
On this page
- Overview
- Why LanceDB?
- Upload to LanceDB
- Semantic and hybrid retrieval
- Hybrid search (LanceDB)
- LanceDB deployment characteristics
- Upload to a Custom Data Store
- Vector database partners
- Related Topics
Overview
NeMo Retriever Library supports extracting text representations of various forms of content, and ingesting to a vector database. LanceDB is the vector database backend for storing and retrieving extracted embeddings.
The data upload task (vdb_upload) pulls extraction results to the Python client,
and then pushes them to LanceDB (embedded, in-process).
The vector database stores only the extracted text representations of ingested data. It does not store the embeddings for images.
Storing Extracted Images
To persist extracted images, tables, and chart renderings to disk or object storage, use the store task in addition to vdb_upload. The store task supports any fsspec-compatible backend (local filesystem, S3, GCS, and other object stores). For details, refer to Store Extracted Images.
NeMo Retriever Library supports uploading data by using the Ingestor.vdb_upload API. Currently, data upload is not supported through the CLI.
Why LanceDB?
LanceDB is optimized for low-latency retrieval in this stack:
- Lance columnar format — Data is stored in Lance files, an Arrow/Parquet-style analytics layout optimized for fast local scans and indexed retrieval. This reduces serialization overhead compared with a separate database server.
- IVF_HNSW_SQ index — Vectors are scalar-quantized (SQ) within an IVF-HNSW index, compressing them for faster search with lower memory bandwidth cost.
- Embedded runtime — LanceDB runs in-process, so you do not run extra vector-database containers for the default path. Fewer moving parts to start, configure, and maintain.
This combination of file format, index strategy, and in-process runtime supports the latency characteristics described in benchmarks.
Upload to LanceDB
LanceDB uses the LanceDB operator class from the client library. You can configure it via the Python API.
Programmatic API (Python)
Pass vdb_op="lancedb" to vdb_upload, or construct a LanceDB instance and pass it as vdb_op:
from nemo_retriever.vdb.lancedb import LanceDB
vdb = LanceDB(
uri="./lancedb_data", # Path to LanceDB database directory
table_name="nemo-retriever", # Table name
index_type="IVF_HNSW_SQ", # Index type (default)
hybrid=False, # True = also build FTS for hybrid (see Hybrid search below)
)
# Ingest
vdb.run(results)
# Dense-only retrieve when hybrid=False (default)
docs = vdb.retrieval(queries, top_k=10)
With hybrid=False, vdb.retrieval() runs dense vector search. With hybrid=True, vdb.run(results) also builds the BM25/FTS index for hybrid ingest, but LanceDB.retrieval() does not implement hybrid queries and raises NotImplementedError if the operator was created with hybrid=True. For hybrid (dense + BM25 + RRF) queries, import and call lancedb_hybrid_retrieval() from the same LanceDB helper module you use with Ingestor for vdb_op="lancedb" (see Hybrid search (LanceDB) and the Python API for the current import path).
When using the Ingestor with vdb_upload, pass vdb_op="lancedb" or a LanceDB instance so uploads target LanceDB. If you omit vdb_op, the ingestion Python client still defaults the string argument to "milvus" for backward compatibility, which is not the LanceDB operator—always pass vdb_op="lancedb" when you intend LanceDB.
Semantic and hybrid retrieval
Semantic retrieval uses dense embeddings to find content that is similar in meaning to a query. Hybrid retrieval combines dense vectors with sparse or lexical signals (for example, BM25-style full-text) and fuses ranked lists for better recall on keyword-heavy queries.
In NeMo Retriever Library, the default vector path is LanceDB. Use these resources together with the sections on this page:
- Hybrid search (LanceDB) for LanceDB hybrid mode (dense vectors, BM25, and RRF) and query APIs
- Concepts for broader pipeline and search patterns
- Environment variables for hybrid-related flags where documented
- Custom metadata and filtering for query-time filtering
Evaluation — For evaluation and metrics, refer to Evaluate on your data.
Hybrid search (LanceDB)
LanceDB supports hybrid retrieval, combining dense vector similarity with BM25 full-text search. Results are fused using Reciprocal Rank Fusion (RRF) reranking.
Hybrid search improves recall by approximately +0.5% to +3.5% over vector-only retrieval with negligible latency impact:
| Dataset | Vector-Only Recall@5 | Hybrid Recall@5 | Delta |
|---|---|---|---|
| bo767 (76K rows) | 84.5% | 85.0% | +0.5% |
| bo767 (reranked) | 90.7% | 91.8% | +1.1% |
| earnings (19K rows) | 61.5% | 65.0% | +3.5% |
| earnings (reranked) | 74.5% | 76.4% | +1.9% |
Hybrid search latency is typically 28–57 ms/query (vs. 31–37 ms/query for vector-only). The one-time FTS index build adds approximately 6.5 seconds for a 76K-row dataset.
Enable hybrid ingest by setting hybrid=True when creating the LanceDB operator so vdb.run(results) builds the BM25-friendly FTS index alongside vectors.
Hybrid queries use lancedb_hybrid_retrieval, not LanceDB.retrieval()
LanceDB.retrieval() only supports dense vector search. If the operator was created with hybrid=True, calling vdb.retrieval(...) raises NotImplementedError (“hybrid retrieval with precomputed vectors is not implemented yet”). For hybrid (dense + BM25 + RRF) queries, use lancedb_hybrid_retrieval() from the same module, with the same table_path / table_name as the LanceDB instance:
from nemo_retriever.vdb.lancedb import LanceDB
# Also import lancedb_hybrid_retrieval from the same LanceDB helper module you use with Ingestor
# (see nemo-retriever-api-reference.md).
vdb = LanceDB(uri="./lancedb_data", table_name="nemo-retriever", hybrid=True)
vdb.run(results)
docs = lancedb_hybrid_retrieval(
queries,
table_path="./lancedb_data",
table_name="nemo-retriever",
top_k=10,
)
LanceDB deployment characteristics
| Aspect | LanceDB |
|---|---|
| Runtime model | Embedded (in-process) |
| External services | None for the vector store itself |
| Helm / extra stack | Not required for LanceDB (default path) |
| Index type | IVF_HNSW_SQ (default) |
| Hybrid search | BM25 FTS + vector (RRF) when enabled |
| Persistence | Lance files on disk under your configured URI |
Upload to a Custom Data Store
You can ingest to other data stores by using the Ingestor.vdb_upload method;
however, you must configure other data stores and connections yourself.
NeMo Retriever Library does not provide connections to other data sources.
Vector database partners
NeMo Retriever Library integrates with vector databases used for RAG collections. The sections above focus on LanceDB as used in the library. This section summarizes other client VDB implementations and how they plug into NeMo Retriever Library graph operators. For chunking behavior, see Chunking.
Backends with VDB implementations (retriever adapters)
NeMo Retriever graph operators IngestVdbOperator and RetrieveVdbOperator wrap concrete classes that implement the VDB interface (run for ingest, retrieval for search). The following external vector databases have implementations in the client library you can pass as vdb / configure via vdb_op where supported:
| Backend | Project | Implementation |
|---|---|---|
| LanceDB | LanceDB · documentation | lancedb.py — pass vdb_op="lancedb" (recommended). |
| OpenSearch | OpenSearch · Vector search | Reference OpenSearch operator in the repository’s client tree; wire your own OpenSearch instance as vdb and see Build a Custom Vector Database Operator. |
On the ingestion Python client's Ingestor.vdb_upload, omitting vdb_op does not select LanceDB; see Upload to LanceDB.
For LanceDB, pass vdb_op="lancedb" (or a LanceDB instance). For other VDB subclasses, construct the client class and pass it as the graph operator’s vdb argument.
RAG Blueprint and partner vector stores
Some deployments use a different vector store than the default LanceDB path on this page—for example the NVIDIA RAG Blueprint (Docker Compose or Helm) or a partner package that subclasses the same VDB interface. Use the following public references when you wire those stacks to ingestion and retrieval:
| Vector store | Where to configure or implement |
|---|---|
| Elasticsearch | Configure Elasticsearch as Your Vector Database for NVIDIA RAG Blueprint — compose profiles, environment variables, and Helm notes for the RAG Blueprint. |
| Pinecone | Customize your vector database (Pinecone + NVIDIA RAG) in the pinecone-io/nvidia-pinecone-rag repository. |
| Teradata | TeradataVDB (NVIDIA NIM Ingest integration) — teradatagenai.vector_store.teradataVDB.TeradataVDB implements the NeMo Retriever ingestion VDB abstract class for Teradata Vector Store. |
Testing and release cadence for these integrations follow the owning project (RAG Blueprint, Pinecone sample repo, or Teradata Generative AI package), not the first-party LanceDB operator validated for NeMo Retriever Library on this page.
More information (embeddings & custom VDB)
- Multimodal embeddings (VLM)
- NeMo Retriever Text Embedding NIM
- NVIDIA NIM catalog for embedding and retrieval-related NIMs
Important
NVIDIA documents and validates the first-party LanceDB operator for this library. If you integrate a different vector store, you are responsible for testing and maintaining that integration.
To implement a custom operator, follow the VDB abstract interface described in Build a Custom Vector Database Operator.