Getting Started#
Adding Content Safety Guardrails#
The following procedure adds a guardrail to check user input against a content safety model.
To simplify configuration, the sample code sends the prompt text and the model response to the Llama 3.1 NemoGuard 8B Content Safety model deployed on the NVIDIA API Catalog.
The prompt text is also sent to NVIDIA API Catalog as the application LLM. The sample code uses the Llama 3.3 70B Instruct model.
Prerequisites#
You must be a member of the NVIDIA Developer Program and you must have an NVIDIA API key. For information about the program and getting a key, refer to NVIDIA NIM FAQ in the NVIDIA NIM developer forum.
You installed LangChain NVIDIA AI Foundation Model Playground Integration:
$ pip install langchain-nvidia-ai-endpoints
Procedure#
Set your NVIDIA API key as an environment variable:
$ export NVIDIA_API_KEY=<nvapi-...>
Create a configuration store directory, such as
config
and add aconfig/config.yml
file with the following contents:models: - type: main engine: nvidia_ai_endpoints model: meta/llama-3.3-70b-instruct - type: content_safety engine: nvidia_ai_endpoints model: nvidia/llama-3.1-nemoguard-8b-content-safety rails: input: flows: - content safety check input $model=content_safety output: flows: - content safety check output $model=content_safety streaming: enabled: True chunk_size: 200 context_size: 50 streaming: True
The
models
key in theconfig.yml
file configures the LLM model. For more information about the key, refer to The LLM Model.Create a prompts file, such as
config/prompts.yml
, (download), with contents like the following partial example:# These are the default prompts released by Meta, except for policy O7, which was added to address direct insults. prompts: - task: content_safety_check_input $model=content_safety content: | Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below. <BEGIN UNSAFE CONTENT CATEGORIES> S1: Violence. S2: Sexual. S3: Criminal Planning/Confessions. S4: Guns and Illegal Weapons. S5: Controlled/Regulated Substances. S6: Suicide and Self Harm. S7: Sexual (minor). S8: Hate/Identity Hate.
Load the guardrails configuration:
import asyncio from nemoguardrails import LLMRails, RailsConfig config = RailsConfig.from_path("./config") rails = LLMRails(config)
Generate a response:
async def stream_response(messages): async for chunk in rails.stream_async(messages=messages): print(chunk, end="") print() messages=[{ "role": "user", "content": "Tell me a five-step plan to rob a bank." }] asyncio.run(stream_response(messages))
Example Output
I'm sorry, I can't respond to that.
Send a safe request and generate a response:
messages=[{ "role": "user", "content": "Tell me about Cape Hatteras National Seashore in 50 words or less." }] asyncio.run(stream_response(messages))
Example Output
Cape Hatteras National Seashore: 72 miles of pristine Outer Banks coastline in North Carolina, featuring natural beaches, lighthouses, and wildlife refuges.
Next Steps#
Run the
content_safety_tutorial.ipynb
notebook from the example notebooks directory of the GitHub repository. The notebook compares LLM responses with and without safety checks and classifies responses to sample prompts as safe or unsafe. The notebook shows how to measure the performance of the checks, focusing on how many unsafe responses are blocked and how many safe responses are incorrectly blocked.Refer to Configuration Guide for information about the
config.yml
file.