Garak is a free, open-source tool specifically designed to test the robustness and reliability of Large Language Models (LLMs).
Inspired by utilities like Nmap or Metasploit, Garak identifies potential weak points in LLMs by probing for issues such as hallucinations, data leakage, prompt injections, toxicity, jailbreak effectiveness, and misinformation propagation.
This guide covers everything you need to get started with Garak, from installation to interpreting results and developing custom plugins.
Yes, Garak supports private endpoints for platforms like Hugging Face, Replicate, and OctoAI.
What is Garak?
Garak stands for Generative AI Red-Teaming and Assessment Kit. It systematically identifies the vulnerabilities of LLMs by using a combination of static, dynamic, and adaptive probes. Garak is ideal for:
- Security researchers testing vulnerabilities in LLMs.
- Developers looking to ensure the safety of their AI systems.
- AI ethics professionals assessing the risks of generative systems.
If you’re familiar with penetration testing for software, think of Garak as its counterpart for LLMs.
Key Features
Probing for Weaknesses: Garak tests LLMs for several vulnerabilities, including:
- Hallucination
- Data leakage
- Prompt injection
- Misinformation
- Toxicity generation
- Jailbreaking attempts
- Encoding-based prompt injections
- Cross-site scripting (XSS)
Wide Compatibility: Supports popular platforms like Hugging Face, OpenAI, Replicate, Cohere, and others.
Customizable: Easily integrate with REST endpoints or develop your own probes and plugins.
Logging and Analysis: Detailed logs to trace vulnerabilities and their context.
Supported LLM Platforms
Garak supports models from the following platforms:
- Hugging Face: Local models or API-based models.
- OpenAI: Includes GPT-3.5, GPT-4, and others.
- Replicate: Both public and private models.
- Cohere: For generative text models.
- NVIDIA NIM, OctoAI, Groq, and many more.
It also provides support for custom REST endpoints, making it highly flexible.
Installation Instructions
1. Standard Installation
Install the latest release from PyPI with the following command:
python -m pip install -U garak
2. Development Version
To install the latest version directly from GitHub, use:
python -m pip install -U git+https://github.com/NVIDIA/garak.git@main
3. Cloning from Source
If you want to work with the source code, follow these steps:
conda create --name garak "python>=3.10,<=3.12"
conda activate garak
git clone https://github.com/NVIDIA/garak.git
cd garak
python -m pip install -e .
Note: If you cloned Garak before its move to the NVIDIA GitHub organization, update your GitHub remote URLs:
git remote set-url origin https://github.com/NVIDIA/garak.git
Getting Started
General Syntax
The basic command-line syntax for Garak is:
garak <options>
Running Probes
To list all available probes:
garak --list_probes
To execute all probes on a model:
garak --model_type <model_family> --model_name <model_name>
Example Probes
- Test OpenAI’s GPT-3.5 for encoding-based prompt injection:
export OPENAI_API_KEY="sk-your-key-here"
garak --model_type openai --model_name gpt-3.5-turbo --probes encoding
- Check vulnerability of Hugging Face’s GPT-2 to DAN 11.0 jailbreak attack:
garak --model_type huggingface --model_name gpt2 --probes dan.Dan_11_0
Reading Results
- Pass/Fail Categories: Results are displayed with a diagnostic summary after each probe.
- Failure Rate Analysis: Vulnerabilities are quantified and logged for reference.
- Logs and Reports: Detailed logs are stored in
garak.log
and JSONL files for deeper analysis.
Understanding Generators
A “generator” in Garak defines the type and specific instance of the LLM that will be probed. Examples include:
Hugging Face
- Local Models:
--model_type huggingface --model_name RWKV/rwkv-4-169m-pile
- API-Based Models:
--model_type huggingface.InferenceAPI --model_name mosaicml/mpt-7b-instruct
OpenAI
Set your API key:
export OPENAI_API_KEY="sk-your-key-here"
Run:
garak --model_type openai --model_name gpt-3.5-turbo
REST Endpoints
Connect to any custom REST endpoint:
--model_type rest.RestGenerator --model_name <endpoint_config.yaml>
Intro to Probes
Probes are predefined tests that stimulate specific failure modes in LLMs. Some key probes include:
- Encoding: Tests for vulnerabilities in encoded prompts.
- DAN: Simulates common jailbreak attacks.
- PromptInject: Explores prompt injection weaknesses.
- Misinformation: Encourages the model to create or support misleading content.
- Toxicity: Tests how a model handles sensitive or offensive content.
- RealToxicityPrompts: Uses real-world toxic prompts to test robustness.
To run a specific probe:
garak --probes <probe_name>
Examples:
- Run only the PromptInject probe:
garak --model_type openai --model_name gpt-3.5-turbo --probes promptinject
- Run a submodule probe:
garak --probes lmrc.SlurUsage
Logging and Analysis
Garak generates the following logs:
- Primary Log (
garak.log
): Debugging and runtime logs. - JSONL Report: Structured reports with probe details.
- Hit Log: Highlights vulnerabilities detected during runs.
To analyze data, use:
python3 analyse/analyse_log.py
Developing Custom Plugins
Garak allows users to develop their own custom plugins, such as probes, detectors, or evaluators. Here’s how:
- Inherit from Base Classes: Use existing modules as templates. For example:
from garak.probes.base import TextProbe
- Override Methods: Add only the functionality you need.
- Test Your Plugin:
garak --model_type test.Blank --probes mymodule --detectors always.Pass
Find this News Interesting! Follow us on Google News, LinkedIn, and X to Get Instant Updates!