agentbreaker
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
AI security testing engine for surfacing prompt leaks, bypasses, and unsafe agent behavior.
AgentBreaker
AgentBreaker is not another tool shipping prompt fuzzing and calling it red teaming.
It probes the system first, identifies capabilities such as multi-turn behavior, tool use, and multimodal handling, then generates targeted payloads shaped to the surface it found. The result is a red-team engine that is built to get you outcomes, not just logs.
Why It Exists
Most AI security testing is still too manual, too noisy, or too one-off.
AgentBreaker gives you:
- repeatable campaign runs
- structured evidence instead of one-off screenshots
- judge, planner, and generator assisted workflows
- an operator control plane for launches and review
This repo now starts with a clean slate. It does not ship a bundled public seed corpus.
What matters is what AgentBreaker has already been able to surface across real systems.
What It Can Surface
AgentBreaker is built to help teams uncover issues such as:
- system prompt leakage and hidden instruction disclosure
- jailbreak and policy bypass paths
- unsafe tool behavior and action chaining
- sensitive data exposure and retrieval abuse
- browser and API workflow weaknesses around agent execution
- weak refusal patterns that collapse under pressure
Public Showcase
The public results corpus already demonstrates outcomes such as:
resistance-level-1: completion-style prompt extraction that disclosed a protected flagpromptairlines: structured JSON export that disclosed protected runtime valuespromptairlines: authority-override framing that yielded restricted coupon datapromptairlines: multimodal injection flows that exfiltrated protected artifacts from uploaded contentgpt-5.2: successful runs across jailbreak, prompt injection, tool misuse, and data exposure patternsgpt-5.4: successful runs across prompt injection, guardrail bypass, tool misuse, and prompt extraction
How It Flows
flowchart LR
A["Configure system"] --> B["Launch campaign"]
B --> C["Generate probes"]
C --> D["Execute and score"]
D --> E["Store evidence and results"]
E --> F["Review in control plane"]
Quick Start
git clone https://github.com/kagexai/agentbreaker.git
cd agentbreaker
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
agentbreaker validate --check-env
agentbreaker run <system-id> --loop
Open the control plane:
agentbreaker serve --port 1337
Then visit http://127.0.0.1:1337.
Operator Paths
Run a campaign:
agentbreaker run <system-id> --loop
Validate config before a run:
agentbreaker validate --check-env
Inspect configured systems:
agentbreaker targets
Start the review surface:
agentbreaker serve --port 1337
Core Files
agentbreaker/cli.py- main CLI entrypointagentbreaker/campaign.py- campaign loop and strategy selectionagentbreaker/attack.py- payload constructionagentbreaker/target.py- execution harness and scoringagentbreaker/control_plane.py- operator backendfrontend/- control plane frontendtaxonomy/agentbreaker_taxonomy.yaml- strategy librarytarget_config.yaml- system and model configuration
Safety
AgentBreaker is for authorized testing only. Do not run it against systems you do not own or do not have explicit permission to assess.
License
MIT. See LICENSE.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found