Llama-Guard Integration#
NeMo Guardrails provides out-of-the-box support for content moderation using Meta’s Llama Guard model.
In our testing, we observe significantly improved input and output content moderation performance compared to the self-check method. Please see additional documentation for more details on the recommended deployment method and the performance evaluation numbers.
Usage#
To configure your bot to use Llama Guard for input/output checking, follow the below steps:
Add a model of type
llama_guard
to the models section of theconfig.yml
file (the example below uses a vLLM setup):
models:
...
- type: llama_guard
engine: vllm_openai
parameters:
openai_api_base: "http://localhost:5123/v1"
model_name: "meta-llama/LlamaGuard-7b"
Include the
llama guard check input
andllama guard check output
flow names in the rails section of theconfig.yml
file:
rails:
input:
flows:
- llama guard check input
output:
flows:
- llama guard check output
Define the
llama_guard_check_input
and thellama_guard_check_output
prompts in theprompts.yml
file:
prompts:
- task: llama_guard_check_input
content: |
<s>[INST] Task: ...
<BEGIN UNSAFE CONTENT CATEGORIES>
O1: ...
O2: ...
- task: llama_guard_check_output
content: |
<s>[INST] Task: ...
<BEGIN UNSAFE CONTENT CATEGORIES>
O1: ...
O2: ...
The rails execute the llama_guard_check_*
actions, which return True
if the user input or the bot message should be allowed, and False
otherwise, along with a list of the unsafe content categories as defined in the Llama Guard prompt.
define flow llama guard check input
$llama_guard_response = execute llama_guard_check_input
$allowed = $llama_guard_response["allowed"]
$llama_guard_policy_violations = $llama_guard_response["policy_violations"]
if not $allowed
bot refuse to respond
stop
# (similar flow for checking output)
A complete example configuration that uses Llama Guard for input and output moderation is provided in this example folder.