hub / github.com/NVIDIA/garak

github.com/NVIDIA/garak @v0.15.1

repository ↗ · DeepWiki ↗ · release v0.15.1 ↗ · + Follow

2,760 symbols 10,010 edges 441 files 1,055 documented · 38%

README

garak, LLM vulnerability scanner

Generative AI Red-teaming & Assessment Kit

garak checks if an LLM can be made to fail in a way we don't want. garak probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. If you know nmap or msf / Metasploit Framework, garak does somewhat similar things to them, but for LLMs.

garak focuses on ways of making an LLM or dialog system fail. It combines static, dynamic, and adaptive probes to explore this.

garak's a free tool. We love developing it and are always interested in adding functionality to support applications.

Get started

> See our user guide! docs.garak.ai

> Join our Discord!

> Project links & home: garak.ai

> Twitter: @garak_llm

> DEF CON slides!

LLM support

currently supports: * hugging face hub generative models * replicate text models * openai api chat & continuation models * aws bedrock foundation models * litellm * pretty much anything accessible via REST * gguf models like llama.cpp version >= 1046 * .. and many more LLMs!

Install:

garak is a command-line tool. It's developed in Linux and OSX.

Standard install with `pip`

Just grab it from PyPI and you should be good to go:

python -m pip install -U garak

Install development version with `pip`

The standard pip version of garak is updated periodically. To get a fresher version from GitHub, try:

python -m pip install -U git+https://github.com/NVIDIA/garak.git@main

Clone from source

garak has its own dependencies. You can to install garak in its own Conda environment:

conda create --name garak "python>=3.10,<=3.12"
conda activate garak
gh repo clone NVIDIA/garak
cd garak
python -m pip install -e .

OK, if that went fine, you're probably good to go!

Note: if you cloned before the move to the NVIDIA GitHub organisation, but you're reading this at the github.com/NVIDIA URI, please update your remotes as follows:

git remote set-url origin https://github.com/NVIDIA/garak.git

Getting started

The general syntax is:

garak <options>

garak needs to know what model to scan, and by default, it'll try all the probes it knows on that model, using the vulnerability detectors recommended by each probe. You can see a list of probes using:

garak --list_probes

To specify a generator, use the --target_type and, optionally, the --target_name options. Model type specifies a model family/interface; model name specifies the exact model to be used. The "Intro to generators" section below describes some of the generators supported. A straightforward generator family is Hugging Face models; to load one of these, set --target_type to huggingface and --target_name to the model's name on Hub (e.g. "RWKV/rwkv-4-169m-pile"). Some generators might need an API key to be set as an environment variable, and they'll let you know if they need that.

garak runs all the probes by default, but you can be specific about that too. --probes promptinject will use only the PromptInject framework's methods, for example. You can also specify one specific plugin instead of a plugin family by adding the plugin name after a .; for example, --probes lmrc.SlurUsage will use an implementation of checking for models generating slurs based on the Language Model Risk Cards framework.

For help and inspiration, find us on Twitter or discord!

Examples

Probe a commercial model for encoding-based prompt injection (OSX/*nix) (replace example value with a real OpenAI API key)

export OPENAI_API_KEY="sk-123XXXXXXXXXXXX"
python3 -m garak --target_type openai --target_name gpt-5-nano --probes encoding

See if the Hugging Face version of GPT2 is vulnerable to DAN 11.0

python3 -m garak --target_type huggingface --target_name gpt2 --probes dan.Dan_11_0

Reading the results

For each probe loaded, garak will print a progress bar as it generates. Once generation is complete, a row evaluating that probe's results on each detector is given. If any of the prompt attempts yielded an undesirable behavior, the response will be marked as FAIL, and the failure rate given.

Here are the results with the encoding module on a GPT-3 variant: alt text

And the same results for ChatGPT: alt text

We can see that the more recent model is much more susceptible to encoding-based injection attacks, where text-babbage-001 was only found to be vulnerable to quoted-printable and MIME encoding injections. The figures at the end of each row, e.g. 840/840, indicate the number of text generations total and then how many of these seemed to behave OK. The figure can be quite high because more than one generation is made per prompt - by default, 10.

Errors go in garak.log; the run is logged in detail in a .jsonl file specified at analysis start & end. There's a basic analysis script in analyse/analyse_log.py which will output the probes and prompts that led to the most hits.

Send PRs & open issues. Happy hunting!

Intro to generators

Hugging Face

Using the Pipeline API: * --target_type huggingface (for transformers models to run locally) * --target_name - use the model name from Hub. Only generative models will work. If it fails and shouldn't, please open an issue and paste in the command you tried + the exception!

Using the Inference API: * --target_type huggingface.InferenceAPI (for API-based model access) * --target_name - the model name from Hub, e.g. "mosaicml/mpt-7b-instruct"

Using private endpoints: * --target_type huggingface.InferenceEndpoint (for private endpoints) * --target_name - the endpoint URL, e.g. https://xxx.us-east-1.aws.endpoints.huggingface.cloud

(optional) set the HF_INFERENCE_TOKEN environment variable to a Hugging Face API token with the "read" role; see https://huggingface.co/settings/tokens when logged in

OpenAI

--target_type openai
--target_name - the OpenAI model you'd like to use. gpt-5-nano is fast and fine for testing.
set the OPENAI_API_KEY environment variable to your OpenAI API key (e.g. "sk-19763ASDF87q6657"); see https://platform.openai.com/account/api-keys when logged in

Recognised model types are whitelisted, because the plugin needs to know which sub-API to use. Completion or ChatCompletion models are OK. If you'd like to use a model not supported, you should get an informative error message, and please send a PR / open an issue.

Replicate

set the REPLICATE_API_TOKEN environment variable to your Replicate API token, e.g. "r8-123XXXXXXXXXXXX"; see https://replicate.com/account/api-tokens when logged in

Public Replicate models: * --target_type replicate * --target_name - the Replicate model name and hash, e.g. "stability-ai/stablelm-tuned-alpha-7b:c49dae36"

Private Replicate endpoints: * --target_type replicate.InferenceEndpoint (for private endpoints) * --target_name - username/model-name slug from the deployed endpoint, e.g. elim/elims-llama2-7b

Cohere

--target_type cohere
--target_name (optional, command by default) - The specific Cohere model you'd like to test
set the COHERE_API_KEY environment variable to your Cohere API key, e.g. "aBcDeFgHiJ123456789"; see https://dashboard.cohere.ai/api-keys when logged in

Groq

--target_type groq
--target_name - The name of the model to access via the Groq API
set the GROQ_API_KEY environment variable to your Groq API key, see https://console.groq.com/docs/quickstart for details on creating an API key

ggml

--target_type ggml
--target_name - The path to the ggml model you'd like to load, e.g. /home/leon/llama.cpp/models/7B/ggml-model-q4_0.bin
set the GGML_MAIN_PATH environment variable to the path to your ggml main executable

REST

rest.RestGenerator is highly flexible and can connect to any REST endpoint that returns plaintext or JSON. It does need some brief config, which will typically result a short YAML file describing your endpoint. See https://reference.garak.ai/en/latest/garak.generators.rest.html for examples.

NIM

Use models from https://build.nvidia.com/ or other NIM endpoints. * set the NIM_API_KEY environment variable to your authentication API token, or specify it in the config YAML

For chat models: * --target_type nim * --target_name - the NIM model name, e.g. meta/llama-3.1-8b-instruct

For completion models: * --target_type nim.NVOpenAICompletion * --target_name - the NIM model name, e.g. bigcode/starcoder2-15b

AWS Bedrock

--target_type bedrock
--target_name - the Bedrock model ID or alias, e.g. anthropic.claude-3-sonnet-20240229-v1:0 or claude-3-sonnet
set the BEDROCK_API_KEY environment variable to your AWS Bedrock API key; see https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys-use.html for setup instructions
(optional) set the BEDROCK_REGION environment variable to specify the AWS region (defaults to us-east-1)

Supported model families include Anthropic Claude, Meta Llama, Amazon Titan, AI21 Labs, Cohere, and Mistral AI models. The generator uses the Converse API for unified access across all model types.

Example usage:

export BEDROCK_API_KEY="your-api-key"
export BEDROCK_REGION="us-east-1"
garak --target_type bedrock --target_name claude-3-sonnet --probes dan

Test

--target_type test
(alternatively) --target_name test.Blank For testing. This always generates the empty string, using the test.Blank generator. Will be marked as failing for any tests that require an output, e.g. those that make contentious claims and expect the model to refute them in order to pass.
--target_type test.Repeat For testing. This generator repeats back the prompt it received.

Intro to probes

Probe	Description
blank	A simple probe that always sends an empty prompt.
atkgen	Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 fine-tuned on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now).
badchars	Implements imperceptible Unicode perturbations (invisible characters, homoglyphs, reorderings, deletions) inspired by the Bad Characters paper.
av_spam_scanning	Probes that attempt to make the model output malicious content signatures
continuation	Probes that test if the model will continue a probably undesirable word
dan	Various DAN and DAN-like attacks
donotanswer	Prompts to which responsible language models should not answer.
encoding	Prompt injection through text encoding
gcg	Disrupt a system prompt by appending an adversarial suffix.
glitch	Probe model for glitch tokens that provoke unusual behavior.

Extension points exported contracts — how you extend this code

Props (Interface)

Props for ErrorBoundary component

garak-report/src/components/ErrorBoundary.tsx

State (Interface)

Internal state for error tracking

garak-report/src/components/ErrorBoundary.tsx

DefconBadgeProps (Interface)

Props for DefconBadge component

garak-report/src/components/DefconBadge.tsx

ReportFilterBarProps (Interface)

Props for ReportFilterBar component

garak-report/src/components/ReportFilterBar.tsx

SummaryStatsCardProps (Interface)

Props for SummaryStatsCard component

garak-report/src/components/SummaryStatsCard.tsx

Core symbols most depended-on inside this repo

detect

called by 197

garak/detectors/shields.py

generate

called by 81

garak/generators/base.py

garak/langproviders/local.py

_get_first_valid

called by 24

garak/resources/promptinject/prompting.py

_load_config

called by 23

garak/configurable.py

Shape

Function 1,185

Method 984

Class 491

Interface 72

Route 28

Languages

Python94%

TypeScript6%

Modules by API surface

tests/probes/test_agent_breaker.py64 symbols

tests/test_config.py61 symbols

garak/probes/encoding.py55 symbols

tests/probes/test_probes_goat.py40 symbols

garak/detectors/unsafe_content.py39 symbols

tests/detectors/test_detectors_packagehallucination.py37 symbols

tests/generators/test_llm.py36 symbols

garak/probes/latentinjection.py35 symbols

tests/evaluators/test_evaluators.py34 symbols

tests/detectors/test_detectors_agent_breaker.py34 symbols

garak/resources/fixer/20250224_lightweight_probe_defaults.py34 symbols

garak/probes/base.py33 symbols

Dependencies from manifests, versioned

@eslint/js9.25.0 · 1×

@kui/react./src/assets/kui-fou · 1×

@tailwindcss/vite4.1.7 · 1×

@testing-library/dom10.4.0 · 1×

@testing-library/jest-dom6.6.3 · 1×

@testing-library/react16.3.0 · 1×

@types/react19.1.2 · 1×

@types/react-dom19.1.2 · 1×

@vitejs/plugin-react4.4.1 · 1×

@vitest/coverage-v83.2.4 · 1×

autoprefixer10.4.21 · 1×

echarts5.6.0 · 1×

Datastores touched

(mongodb)Database · 1 repos

mydatabaseDatabase · 1 repos

For agents

$ claude mcp add garak \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact