Skip to content

Pre-Requisites & Support Matrix

Before you begin using NeMo Retriever Library, confirm your software stack, deployment hardware, and—if you use them—advanced features (audio and video, Nemotron Parse, VLM image captioning, reranking) against the guidance in this page.

Software Requirements

  • Linux operating systems (Ubuntu 22.04 or later recommended)
  • CUDA Toolkit (NVIDIA Driver >= 535, CUDA >= 12.2)
  • Python 3.12 — required to install and run the NeMo Retriever Library Python API, CLI, and related packages from PyPI (for example pip or uv). Older Python versions will fail dependency resolution without a clear error.
  • UV Python package and environment manager (optional; recommended for creating isolated environments)

Note

When you use UV, create the environment with Python 3.12 — for example, uv venv --python 3.12. This matches the requires-python metadata in the library packages.

Hardware Requirements

The full ingestion pipeline is designed to consume significant CPU and memory resources to achieve maximal parallelism. Resource usage scales up to the limits of your deployed system.

For per-feature GPU memory, disk, and co-residency rules, refer to Model hardware requirements below.

  • System Memory: At least 256 GB RAM
  • CPU Cores: At least 32 CPU cores
  • GPU: NVIDIA GPU with at least 24 GB VRAM (for example, A100, H100, L40S, or equivalent)

Note

Using less powerful systems or lower resource limits is still viable, but performance will suffer.

Resource Consumption Notes

  • The pipeline performs runtime allocation of parallel resources based on system configuration
  • Memory usage can reach up to the full system capacity for large document processing
  • CPU utilization scales with the number of concurrent processing tasks
  • GPU is required for inference using HuggingFace models or NIMs
  • GPU is NOT required for build.nvidia.com hosted inference

Scaling Considerations

For production deployments processing large volumes of documents, consider: - Higher memory configurations for processing large PDF files or image collections - Additional CPU cores for improved parallel processing - Multiple GPUs for distributed processing workloads

Environment Requirements

Ensure your deployment environment meets these specifications before running the full pipeline. Resource-constrained environments may experience performance degradation.

Core and Advanced Pipeline Features

The NeMo Retriever Library extraction core pipeline features run on a single A10G or better GPU.

Default Helm NIMs

The production Helm chart enables these NIM microservices by default (for example via nimOperator.*.enabled=true):

Helm flag NIM Role
page_elements nemotron-page-elements-v3 Page layout and element detection
table_structure nemotron-table-structure-v1 Table structure extraction
ocr nemotron-ocr-v2 Image OCR
vlm_embed llama-nemotron-embed-vl-1b-v2 Multimodal (VL) embedding

Default VL embedder container and model for release deployments:

  • Image: nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2:1.12.0
  • Model ID: nvidia/llama-nemotron-embed-vl-1b-v2

Optional Helm NIMs (disabled by default)

Enable these only when your workload needs them — the same pattern as the VL reranker (not deployed unless you turn on the reranker flags):

Advanced features (for example, audio and video, Nemotron Parse, VLM image captioning, reranking) require additional GPU support and disk space. This includes the following:

For published NIM model IDs and deployment-specific constraints, use the product support matrices linked under Related Topics below.

Model Hardware Requirements

NeMo Retriever Library supports the following GPU hardware given system constraints in the table.

  • HF model weights — approximate Hugging Face checkpoint footprint (files such as model*.safetensors, weights.pth, or other published weight bundles in the model repository). Values are rounded from the current public file listing and can change when the repository is updated.
  • NIM disk space — approximate container and on-disk model cache for self-hosted NIM microservices (not the same as HF download size). For Nemotron 3 Nano Omni captioning, see the NVIDIA NIM for Vision Language Models support matrix.

Model repositories and NIM references are linked in Core and Advanced Pipeline Features above.

Feature HF Model Weights GPU Option RTX Pro 6000 B200 H200 NVL H100 A100 80GB A100 40GB A10G L40S RTX PRO 4500 Blackwell
GPU Memory 96GB 180GB 141GB 80GB 80GB 40GB 24GB 48GB 32GB GDDR7 (GB203)
Core Features ~4.8 GiB combined: embed VL 1b ~3.1 GiB; page-elements ~0.41 GiB; table-structure ~0.81 GiB; OCR ~0.51 GiB Total GPUs 1 1 1 1 1 1 1 1 1
Core Features Total Disk Space ~150GB ~150GB ~150GB ~150GB ~150GB ~150GB ~150GB ~150GB ~150GB
Audio (parakeet-1-1b-ctc-en-us) ~4.0 GiB (model.safetensors; the repo also ships parakeet-ctc-1.1b.nemo of similar size—use one format to avoid roughly doubling disk use) Additional Dedicated GPUs 1 1 1 1 1 1 1 1
Audio (parakeet-1-1b-ctc-en-us) Additional Disk Space ~37GB ~37GB ~37GB ~37GB ~37GB ~37GB ~37GB ~37GB ~37GB¹
nemotron-parse ~3.5 GiB Additional Dedicated GPUs Not supported Not supported Not supported 1 1 1 1 1 Not supported²
nemotron-parse Additional Disk Space Not supported Not supported Not supported ~16GB ~16GB ~16GB ~16GB ~16GB Not supported²
Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) ~62 GiB (BF16); ~33 GiB (FP8); ~21 GiB (NVFP4) Additional Dedicated GPUs 1 1 1 1 1 Not supported Not supported 2 Not supported³
Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) Additional Disk Space (HF) ~21–62GB ~21–62GB ~21–62GB ~21–62GB ~21–62GB Not supported Not supported ~21–62GB Not supported³
Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) Additional Disk Space (NIM) ~80GB ~80GB ~80GB ~80GB ~80GB Not supported Not supported ~80GB Not supported³
Reranker ~3.1 GiB (llama-nemotron-rerank-vl-1b-v2) With Core Pipeline Yes Yes Yes Yes Yes No* No* No* No*
Reranker Standalone (recall only) Yes Yes Yes Yes Yes Yes Yes Yes Yes

¹ Audio runs but requires runtime engine build — no pre-defined model profile.

² Nemotron Parse fails to start on 32GB.

³ Opt-in Omni captioning uses the nemotron-3-nano-omni-30b-a3b-reasoning NIM (nvcr.io/nim/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:latest). BF16 requires at least 80 GB total GPU memory; see the VLM NIM support matrix. L40S requires two GPUs. A100 40GB, A10G, and RTX PRO 4500 are below the minimum.

* GPUs with less than 80GB VRAM cannot run the reranker concurrently with the core pipeline. To perform recall testing with the reranker on these GPUs, shut down the core pipeline NIM microservices and run only the embedder, reranker, and your vector database.