Skip to content

Environment Variables for NeMo Retriever Extraction

The following are the environment variables that you can use to configure NeMo Retriever extraction. You can specify these in your .env file or directly in your environment.

Note

NeMo Retriever extraction is also known as NVIDIA Ingest and nv-ingest.

General Environment Variables

Name Example Description
DOWNLOAD_LLAMA_TOKENIZER True
If True, the llama-3.2 tokenizer will be pre-dowloaded at build time. If not set to True, the (e5-large-unsupervised)[https://huggingface.co/intfloat/e5-large-unsupervised] tokenizer will be pre-downloaded. Note: setting this to True requires a HuggingFace access token with access to the gated Llama-3.2 models. See below for more info.
HF_ACCESS_TOKEN - The HuggingFace access token used to pre-downlaod the Llama-3.2 tokenizer from HuggingFace (see above for more info). Llama 3.2 is a gated model, so you must request access to the Llama-3.2 models and then set this variable to a token that can access gated repositories on your behalf in order to use DOWNLOAD_LLAMA_TOKENIZER=True.
INGEST_LOG_LEVEL - DEBUG
- INFO
- WARNING
- ERROR
- CRITICAL
The log level for the ingest service, which controls the verbosity of the logging output.
MESSAGE_CLIENT_HOST - redis
- localhost
- 192.168.1.10
Specifies the hostname or IP address of the message broker used for communication between services.
MESSAGE_CLIENT_PORT - 7670
- 6379
Specifies the port number on which the message broker is listening.
MINIO_BUCKET nv-ingest
Name of MinIO bucket, used to store image, table, and chart extractions.
NGC_API_KEY nvapi-*************
An authorized NGC API key, used to interact with hosted NIMs. To create an NGC key, go to https://org.ngc.nvidia.com/setup/api-keys.
NIM_NGC_API_KEY The key that NIM microservices inside docker containers use to access NGC resources. This is necessary only in some cases when it is different from NGC_API_KEY. If this is not specified, NGC_API_KEY is used to access NGC resources.
OTEL_EXPORTER_OTLP_ENDPOINT http://otel-collector:4317
The endpoint for the OpenTelemetry exporter, used for sending telemetry data.
REDIS_INGEST_TASK_QUEUE ingest_task_queue
The name of the task queue in Redis where tasks are stored and processed.

Library Mode Environment Variables

These environment variables apply specifically when running NV-Ingest in library mode.

Name Example Description
NVIDIA_API_KEY nvapi-*************
API key for NVIDIA-hosted NIM services.