Getting Started#

Adding Content Safety Guardrails#

The following procedure adds a guardrail to check user input against a content safety model.

To simplify configuration, the sample code sends the prompt text and the model response to the Llama 3.1 NemoGuard 8B Content Safety model deployed on the NVIDIA API Catalog.

The prompt text is also sent to NVIDIA API Catalog as the application LLM. The sample code uses the Llama 3.3 70B Instruct model.

Prerequisites#

You must be a member of the NVIDIA Developer Program and you must have an NVIDIA API key. For information about the program and getting a key, refer to NVIDIA NIM FAQ in the NVIDIA NIM developer forum.
You installed NeMo Guardrails.
You installed LangChain NVIDIA AI Foundation Model Playground Integration:
```
$ pip install langchain-nvidia-ai-endpoints
```

Procedure#

Set your NVIDIA API key as an environment variable:
```
$ export NVIDIA_API_KEY=<nvapi-...>
```

Create a configuration store directory, such as config and add a config/config.yml file with the following contents:

models:
  - type: main
    engine: nvidia_ai_endpoints
    model: meta/llama-3.3-70b-instruct

  - type: content_safety
    engine: nvidia_ai_endpoints
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

rails:
  input:
    flows:
      - content safety check input $model=content_safety
  output:
    flows:
      - content safety check output $model=content_safety
    streaming:
      enabled: True
      chunk_size: 200
      context_size: 50

streaming: True

The models key in the config.yml file configures the LLM model. For more information about the key, refer to The LLM Model.

Create a prompts file, such as config/prompts.yml, (download), with contents like the following partial example:

# These are the default prompts released by Meta, except for policy O7, which was added to address direct insults.
prompts:
  - task: content_safety_check_input $model=content_safety
    content: |
      Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.

      <BEGIN UNSAFE CONTENT CATEGORIES>
      S1: Violence.
      S2: Sexual.
      S3: Criminal Planning/Confessions.
      S4: Guns and Illegal Weapons.
      S5: Controlled/Regulated Substances.
      S6: Suicide and Self Harm.
      S7: Sexual (minor).
      S8: Hate/Identity Hate.

Load the guardrails configuration:

import asyncio

from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

Generate a response:

async def stream_response(messages):
    async for chunk in rails.stream_async(messages=messages):
        print(chunk, end="")
    print()

messages=[{
    "role": "user",
    "content": "Tell me a five-step plan to rob a bank."
}]

asyncio.run(stream_response(messages))

Example Output

I'm sorry, I can't respond to that.

Send a safe request and generate a response:

messages=[{
    "role": "user",
    "content": "Tell me about Cape Hatteras National Seashore in 50 words or less."
}]

asyncio.run(stream_response(messages))

Example Output

Cape Hatteras National Seashore: 72 miles of pristine Outer Banks coastline in North Carolina, featuring natural beaches, lighthouses, and wildlife refuges.

Next Steps#

Run the content_safety_tutorial.ipynb notebook from the example notebooks directory of the GitHub repository. The notebook compares LLM responses with and without safety checks and classifies responses to sample prompts as safe or unsafe. The notebook shows how to measure the performance of the checks, focusing on how many unsafe responses are blocked and how many safe responses are incorrectly blocked.
Refer to Configuration Guide for information about the config.yml file.