Skip to content

Configure Ray Logging

NeMo Retriever extraction uses Ray for logging. You can use environment variables for fine-grained control over Ray's logging behavior. In addition, NeMo Retriever extraction provides preset configurations that you can use to quickly update Ray logging behavior.

Important

You must set environment variables before you initialize the pipeline, and you must restart the pipeline if you change variable values.

Quick Start - Use Preset Configurations

To get started quickly, use one of the NeMo Retriever extraction package-level preset variables. Run the code below that corresponds to your use case; production, development, or debugging. The log levels are explained following.

Tip

After you set a preset configuration, you can also override individual variables.

# Production deployment - minimal logging, maximum performance
export INGEST_RAY_LOG_LEVEL=PRODUCTION

# Development work (default) - balanced logging
export INGEST_RAY_LOG_LEVEL=DEVELOPMENT  

# Debugging issues - maximum logging and visibility
export INGEST_RAY_LOG_LEVEL=DEBUG

PRODUCTION Log Level

The PRODUCTION log level is optimized for production deployments with minimal logging overhead.

  • Storage Limit – 10GB total (1GB × 10 files)
  • Performance Impact – ~5% CPU reduction, ~200MB memory savings in large clusters

This log level uses the following settings:

  • Log Level – ERROR only
  • Log to Driver – Disabled (worker logs stay in worker files)
  • Import Warnings – Disabled
  • Usage Stats – Disabled
  • Storage – 10GB total (1GB × 10 files)
  • Deduplication – Enabled
  • Encoding – TEXT

DEVELOPMENT Log Level

The DEVELOPMENT log level is a balanced configuration for development work, and is the default log level.

  • Storage Limit – 20GB total (1GB × 20 files)
  • Performance Impact – Balanced performance and visibility

This log level uses the following settings:

  • Log Level – INFO
  • Log to Driver – Enabled
  • Import Warnings – Enabled
  • Usage Stats – Enabled
  • Storage – 20GB total (1GB × 20 files)
  • Deduplication – Enabled
  • Encoding – TEXT

DEBUG Log Level

The DEBUG log level provides maximum visibility for troubleshooting issues.

  • Storage Limit – 20GB total (512MB × 40 files)
  • Performance Impact – ~10% CPU overhead for detailed logging, higher memory usage

This log level uses the following settings:

  • Log Level – DEBUG
  • Log to Driver – Enabled
  • Import Warnings – Enabled
  • Usage Stats – Enabled
  • Storage – 20GB total (512MB × 40 files)
  • Deduplication – Disabled (see all duplicate messages)
  • Encoding – JSON with function names and line numbers

Configuration Reference

The following are the environment variables that you can set to control Ray logging behavior. If you specify an invalid value, the variable reverts to the default value with a warning message.

Variable Type Description Valid Values Default
INGEST_RAY_LOG_LEVEL NeMo Retriever extraction preset Set multiple Ray logging variables to optimize for specific use cases. PRODUCTION, DEVELOPMENT, DEBUG DEVELOPMENT
RAY_DEDUP_LOGS Log flow control Specify whether to log multiple instances of repeated events or to combine into a single entry. 1 to combine repeated messages (for example, [repeated 5x]). 0, 1 1
RAY_DISABLE_IMPORT_WARNING Ray internal logging 1 to suppresses Ray X.Y.Z started message and other warnings during initialization. 0, 1 0
RAY_LOG_TO_DRIVER Log flow control trueto log worker messages in the main process. false to log worker messages in worker log files. true, false true
RAY_LOGGING_ADDITIONAL_ATTRS Core logging control Add Python logger fields like function names, line numbers to each log entry. Comma-separated list (empty)
RAY_LOGGING_ENCODING Core logging control Specify the format for log messages. TEXT, JSON TEXT
RAY_LOGGING_LEVEL Core logging control Specify what events to log. DEBUG to log all Ray internals. WARNING to log only significant events. DEBUG, INFO, WARNING, ERROR, CRITICAL INFO
RAY_LOGGING_ROTATE_BACKUP_COUNT File rotation Specify the number of old log files retained. Total storage = (count + 1) × file size. Integer 19
RAY_LOGGING_ROTATE_BYTES File rotation Specify the log file size before Ray creates a new log file. Use this to prevent unbounded disk usage. Bytes 1073741824 (1GB)
RAY_USAGE_STATS_ENABLED Ray internal logging 1 to enable telemetry collection and related log messages. 0 to disable. 0, 1 1

Configuration Examples

Use a Preset With A Manual Override

The following example uses the DEVELOPMENT preset and then overrides the RAY_LOGGING_LEVEL behavior.

export INGEST_RAY_LOG_LEVEL=DEVELOPMENT  # Use the DEVELOPMENT preset
export RAY_LOGGING_LEVEL=WARNING         # Override just the log level

Log Verbosity Control

By default, Ray generates significant logging output. The following example configures Ray to reduce log volume.

export RAY_DISABLE_IMPORT_WARNING=1  # Suppress Ray initialization warnings
export RAY_LOGGING_LEVEL=WARNING     # Suppress informational messages, show only warnings and errors
export RAY_LOG_TO_DRIVER=false       # Prevent worker logs from appearing in driver process output

Minimal Logging (Legacy)

The following example minimizes logging. Only critical errors are logged. Worker logs are isolated. This reduces log volume by approximately 95%.

Tip

You can achieve the same effect by setting the INGEST_RAY_LOG_LEVEL to PRODUCTION.

export RAY_LOGGING_LEVEL=ERROR
export RAY_LOG_TO_DRIVER=false
export RAY_DISABLE_IMPORT_WARNING=1
export RAY_DEDUP_LOGS=1

Structured Logging for Analysis

The following example results in machine-parseable JSON with metadata for log aggregation systems.

export INGEST_RAY_LOG_LEVEL=DEVELOPMENT
export RAY_LOGGING_ENCODING=JSON
export RAY_LOGGING_ADDITIONAL_ATTRS=name,funcName,lineno,thread,process

Set Custom Storage Limits

The following example automatically cleans up files when logs exceed 5GB. The oldest files are removed first.

# 5GB total log storage (500MB × 10 files)
export RAY_LOGGING_ROTATE_BYTES=524288000
export RAY_LOGGING_ROTATE_BACKUP_COUNT=9

Log Output Examples

INFO level (Default)

2024-01-15 10:30:15,123 INFO worker.py:1234 -- Task task_id=abc123 started
2024-01-15 10:30:15,124 INFO worker.py:1235 -- Processing batch size=100
2024-01-15 10:30:15,125 INFO worker.py:1236 -- Task task_id=abc123 completed

WARNING level

2024-01-15 10:30:20,456 WARNING worker.py:1240 -- Task retry attempt 2/3
2024-01-15 10:30:25,789 ERROR worker.py:1245 -- Task failed: Connection timeout

JSON encoding (DEBUG preset)

{
    "asctime": "2024-01-15 10:30:15,123", 
    "levelname": "INFO", 
    "filename": "worker.py", 
    "lineno": 1234, 
    "message": "Task started", 
    "name": "ray.worker", 
    "funcName": "execute_task", 
    "job_id": "01000000", 
    "worker_id": "abc123", 
    "task_id": "def456"}