RAG Evaluation Application
About Evaluating RAGs
RAGs have two components–a retriever and a generator. To quantify the performance of a RAG pipeline, you have to evaluate these components seperately as well as while they work together.
This RAG evaluation application measures RAG performance using RAGAS metrics and a likert score. The RAGAS metrics are faithfulness, context relevancy, answer similarity, answer relevancy, and context precision. The likert score is a value from 1 to 5 based on helpfulness, relevancy, accuracy, and level of detail of the generated answer.
Comparing the metrics for different RAG pipelines can provide insights and help you choose better parameters for the pipeline. You can evalute the pipelines on standard raw or synthetically generated question-and-answer dataset.
Prerequisites
Clone the Generative AI examples Git repository using Git LFS:
$ sudo apt -y install git-lfs $ git clone git@github.com:NVIDIA/GenerativeAIExamples.git $ cd GenerativeAIExamples/ $ git lfs pull
A host with an NVIDIA A100, H100, or L40S GPU.
Verify NVIDIA GPU driver version 535 or later is installed and that the GPU is in compute mode:
$ nvidia-smi -q -d compute
Example Output
==============NVSMI LOG============== Timestamp : Sun Nov 26 21:17:25 2023 Driver Version : 535.129.03 CUDA Version : 12.2 Attached GPUs : 1 GPU 00000000:CA:00.0 Compute Mode : Default
If the driver is not installed or below version 535, refer to the NVIDIA Driver Installation Quickstart Guide.
Install Docker Engine and Docker Compose. Refer to the instructions for Ubuntu.
Install the NVIDIA Container Toolkit.
Refer to the installation documentation.
When you configure the runtime, set the NVIDIA runtime as the default:
$ sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
If you did not set the runtime as the default, you can reconfigure the runtime by running the preceding command.
Verify the NVIDIA container toolkit is installed and configured as the default container runtime:
$ cat /etc/docker/daemon.json
Example Output
{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "args": [], "path": "nvidia-container-runtime" } } }
Run the
nvidia-smi
command in a container to verify the configuration:$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi -L
Example Output
GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-d8ce95c1-12f7-3174-6395-e573163a2ace)
Generating Data with the Synthetic Data Generator
To generate a synthetic Q&A pair dataset from custom documents, perform the following steps:
In the Generative AI Examples repository, edit the
deploy/compose/eval-app-compose.env
file and specify the input and output paths:Update
DATASET_DIRECTORY
with the path to a directory with the documents to ingest.Copy PDF files to analyze into the specified directory. You can use the
notebooks/dataset.zip
file in the repository for sample PDF files.Update
RESULT_DIRECTORY
with the path for the output Q&A pair dataset.
Set your NVIDIA API key in an environment variable:
$ export NVIDIA_API_KEY='nvapi-*'
From the root of the repository, build and run the synthetic data generator:
$ docker compose \ --env-file deploy/compose/eval-app-compose.env \ -f deploy/compose/docker-compose-evaluation-application.yaml \ build synthetic_data_generator $ docker compose \ --env-file deploy/compose/eval-app-compose.env \ -f deploy/compose/docker-compose-evaluation-application.yaml \ up synthetic_data_generator
Example Output
[+] Running 1/0 ✔ Container data-generator Created Attaching to data-generator data-generator | INFO:data_generator:1/1 data-generator | INFO:pikepdf._core:pikepdf C++ to Python logger bridge initialized data-generator | INFO:matplotlib.font_manager:generated new fontManager data-generator | [nltk_data] Downloading package punkt to /root/nltk_data... data-generator | [nltk_data] Unzipping tokenizers/punkt.zip. data-generator | [nltk_data] Downloading package averaged_perceptron_tagger to data-generator | [nltk_data] /root/nltk_data... data-generator | [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip. data-generator | INFO:__main__:\DATA GENERATED data-generator | data-generator exited with code 0
Generating Answers and Evaluating a RAG Pipeline
Start an instance of the Chain Server.
You can run an example, such as Using the NVIDIA API Catalog, to start a Chain Server.
From the root of the repository, build and run the RAG evaluator:
$ docker compose \ --env-file deploy/compose/eval-app-compose.env \ -f deploy/compose/docker-compose-evaluation-application.yaml \ build rag_evaluator $ docker compose \ --env-file deploy/compose/eval-app-compose.env \ -f deploy/compose/docker-compose-evaluation-application.yaml \ run rag_evaluator
Example Output
INFO:llm_answer_generator:1/1 INFO:llm_answer_generator:1/6 INFO:llm_answer_generator:data: {"id":"e7262f2b-0753-4b6c-813d-a38cd4a5954c","choices":[{"index":0,"message":{"role":"assistant","content":""},"finish_reason":""}]} ... Evaluating: 94%|███████████████████████████████████████████████████████████████████ | 34/36 [00:18<00:00, 2.10it/s] WARNING:ragas.metrics._context_recall:Invalid JSON response. Expected dictionary with key 'Attributed' Evaluating: 100%|███████████████████████████████████████████████████████████████████████| 36/36 [00:22<00:00, 1.62it/s] INFO:evaluator:Results written to /result_dir/result.json and /result_dir/result.parquet INFO:__main__: RAG EVALUATED WITH RAGAS METRICS
Results and Conclusion
Find the following as results of running evaluation application on given qna.json
dataset.
The RESULT_DIRECTORY
path has two newly created files.
A JSON file,
result.json
, with aggregated PERF metrics like the following example:{ "answer_similarity": 0.7944183243305074, "faithfulness": 0.25, "context_precision": 0.249999999975, "context_relevancy": 0.4837612078324153, "answer_relevancy": 0.6902010104258721, "context_recall": 0.5, "ragas_score": 0.4203451750317139 }
A parquet file,
result.parquet
, with PERF metrics for each Q&A pair like the following example:{ "question": "What is the contact email for Jordan Dodge who works in the SHIELD and GeForce NOW division at NVIDIA Corporation?", "answer": " jdodge@nvidia.com", "contexts": [ "products and technologies or enhancements to our existing product and technologies ; market acceptance of our products or our partners ’ products ; design, manufacturing or software defects ; changes in consumer preferences or demands ; changes in industry standards and interfaces ; unexpected loss of performance of our products or technologies when integrated into systems ; as well as other factors detailed from time to time in the most recent reports nvidia files with the securities and exchange commission, or sec, including, but not limited to, its annual report on form 10 - k and quarterly reports on form 10 - q. copies of reports filed with the sec are posted on the company ’ s website and are available from nvidia without charge. these forward - looking statements are not guarantees of future performance and speak only as of the date hereof, and, except as required by law, nvidia disclaims any obligation to update these forward - looking statements to reflect future events or circumstances. © 2023 nvidia corporation. all rights reserved. nvidia, the nvidia logo, bluefield and connectx are trademarks and / or registered trademarks of nvidia corporation in the u. s. and other countries. all other trademarks and copyrights are the property of their respective owners. features, pricing, availability and specifications are subject to change without notice. alexa korkos director, product pr ampere computing + 1 - 925 - 286 - 5270 akorkos @ amperecomputing. com jordan dodge shield, geforce now nvidia corp. + 1 - 408 - 506 - 6849 jdodge @ nvidia. com" ], "ground_truth": "jdodge@nvidia.com", "answer_similarity": 1, "faithfulness": 0, "context_precision": 0.9999999999, "context_relevancy": 0.35714285714285715, "answer_relevancy": 0.7686588526523409, "context_recall": 1, "ragas_score": 0 }