Pentest Agent Suite for Claude Code

Autonomous bug-bounty framework for Claude Code and 6 other AI coding tools — 47 agents, 26 commands, 15 CLI tools, 2 MCP servers.

152 files · 39k+ lines · 47 agents · 26 commands · 15 CLI tools · 6 skills · 2 MCP servers (16 bug-bounty platforms + BYO writeup search) · 545 payload lines

A complete bug bounty framework. Battle-tested hunting methodology with concrete payloads, 7-Question Gate validation, autonomous hunt loops, A→B exploit chain building, persistent brain with endpoint tracking, optional semantic writeup search (bring your own index), automatic cost tracking via CC hooks, live platform integration, and a cross-IDE installer that emits the native format for Claude Code, Codex, Gemini, Cursor, Windsurf, and VS Code Copilot.

Quick Start

pip install mcp
export HACKERONE_USERNAME=you HACKERONE_TOKEN=your_token
python3 tools/scaffold.py hackerone tesla --type web-app
cd ~/bounties/hackerone-tesla && claude
/model opus             # Opus 4.6 [1M] — subagents inherit via model: "inherit"
/sync hackerone tesla
/brain init && /status
/hunt tesla.com

Install (Claude Code + 6 other AI coding tools)

pentest-agents ships a cross-IDE installer that emits each target's native
format — agents, skills, commands, rules, and MCP configuration — so the same
framework works everywhere.

# From a clone:
python3 -m tools.installer install --targets all --scope project

PyPI distribution is WIP. uv build produces a working wheel, but the
installed CLI currently resolves source files relative to a repo clone layout
(.claude/agents, .claude/skills, skills/, rules/, rules/payloads.md,
mcp-*-server/). Running via pipx install / uvx pentest-agents will
execute but install an empty manifest. Until this is fixed, run the installer
from a clone.

Target	Agents	Commands	Rules	MCP	Scopes
Claude Code	native `.claude/agents/*.md`	`.claude/skills/<name>/SKILL.md`	`CLAUDE.md`	`.mcp.json` / `~/.claude.json`	global + project
OpenAI Codex	native `.codex/agents/*.toml`	`.codex/commands/*.md`	`AGENTS.md`	`[mcp_servers.*]` in `config.toml`	global + project
Google Gemini	native `.gemini/agents/*.md`	TOML in `.gemini/commands/`	`GEMINI.md`	`mcpServers` in `settings.json`	global + project
Cursor	→ Skills	→ Skills	`.cursor/rules/*.mdc` + `AGENTS.md`	`.cursor/mcp.json`	global + project
Windsurf	→ Skills	Workflows	`.windsurf/rules/*.md` (≤12K / file)	`~/.codeium/windsurf/mcp_config.json`	global + project
VS Code Copilot	`.github/agents/*.agent.md`	`.github/prompts/*.prompt.md`	`.github/copilot-instructions.md` + `.github/instructions/*`	`.vscode/mcp.json`	project + global-MCP
OpenClaw	→ Skills	→ Skills	`~/.openclaw/workspace/AGENTS.md` or `<proj>/AGENTS.md`	`mcp.servers` in `~/.openclaw/openclaw.json`	global + project (skills/rules only; MCP is user-level)

Cursor, Windsurf, and OpenClaw have no native subagent concept, so Claude-format
agents are rendered as Skills for those three (the closest analogue). Every
target's rule digest is a single canonical AGENTS.md-compatible file when
supported.

OpenClaw specifics (verified against docs.openclaw.ai, April 2026):
skills install into ~/.openclaw/skills/<name>/SKILL.md (global) or
<project>/.agents/skills/<name>/SKILL.md (project — AgentSkills convention).
MCP is always wired into the user-level ~/.openclaw/openclaw.json under
mcp.servers.*; project-scope installs emit a warning reminding you to run
--scope global once if you need the MCP servers.

Management:

pentest-agents list                      # detect which targets are installed
pentest-agents install --targets claude_code,codex --scope global
pentest-agents install --dry-run         # preview every file + JSON merge
pentest-agents verify                    # check manifest vs. disk (drift)
pentest-agents uninstall                 # reverse, restore .pa-backup files

Every install records a manifest (.pentest-agents/manifest.json for project
scope, ~/.config/pentest-agents/manifest.json for global). Uninstall only
removes files we wrote and surgically strips only the MCP/JSON keys we merged —
your other settings are never touched. Conflicting writes back up the original
as <path>.pa-backup and are restored on uninstall.

Workflow

New program:   /new → /sync → /brain init → /analyze → /surface → /hunt
Returning:     /resume <target> → /hunt or /autopilot
After finding: /validate → /chain → /report → /dupcheck → /submit → /learn
Batch triage:  /triage (7-Question Gate on all findings)

MCP Servers (2)

bounty-platforms (16 platforms)

HackerOne (full API), Bugcrowd, Intigriti, Immunefi (public), YesWeHack + 11 stubs.
7 MCP tools: list_platforms, get_program_scope, get_program_policy, search_hacktivity, sync_program, draft_report, submit_report.

writeup-search (BYO index)

Searchable knowledge base agents query during hunting and validation.
4 MCP tools:

search_writeups — semantic search (FAISS) or keyword search for prior art
get_writeup — full writeup content by ID
search_techniques — exploitation techniques by vuln class
search_payloads — curated payloads from rules/payloads.md

The writeup index is not bundled. Bulk-redistributing scraped hacktivity violates most platform ToS, so this repo ships the server only. The search_payloads + search_techniques fallback works out of the box; the semantic/keyword layers activate once you point the server at your own index.

Three search modes (auto-detected, graceful fallback):

Mode	Requires	Searches
FAISS (semantic)	`faiss-cpu`, `sentence-transformers`, your `metadata.db` + `index.faiss`	Your writeup corpus via vector embeddings
SQLite (keyword)	Your `metadata.db` only	Your writeup corpus via `LIKE` over the text column
Local (default)	Nothing — zero deps	`rules/payloads.md` + `skills/` shipped in this repo

Point the server at your index by dropping metadata.db (+ optionally index.faiss) into ~/.local/share/pentest-writeups/, or set WRITEUP_DB_DIR=/path/to/dir.

Expected schema (metadata.db): a SQLite file with at least one table containing columns id, title, url, and one text column (content / text / body / writeup). Row order in the table must match vector order in index.faiss when using semantic mode.

Build your own index — `rag-builder/`

The repo now ships a local RAG/FAISS builder under rag-builder/ that turns a list of GitHub / GitLab repositories into a metadata.db + index.faiss pair the writeup-search MCP server consumes. Destructive operations (clone, embed, write) are always gated behind --execute — running the CLI without it prints the plan and changes nothing, so you can never wipe an existing index by accident.

cd rag-builder

# 1. Inspect the plan — no network, no writes.
python3 build.py status
python3 build.py ingest                    # dry-run (the default)

# 2. Opt-in pre-flight: probe every URL with `git ls-remote` (network).
python3 build.py ingest --check-remotes    # ~5s for 141 repos at 16 workers

# 3. Actually clone + index every repo from repos.yaml into ./data/.
python3 build.py ingest --execute
python3 build.py ingest --execute --check-remotes   # skip unreachable first

# 4. Point the MCP server at the output.
export WRITEUP_DB_DIR="$PWD/data"
python3 ../mcp-writeup-server/server.py --test

rag-builder/repos.yaml ships with a 146-entry seed covering CTF archives, bug-bounty reports, payload collections, and research aggregators — edit freely. repos-skipped.yaml is loaded automatically as an exclusion list (override with --skip-list or --no-skip-list). config.yaml controls the embedding model (all-MiniLM-L6-v2 by default), host allowlist, clone size cap, and file-size ceiling. See rag-builder/README.md for the full reference.

CC Hooks (automatic cost tracking)

Configured in settings.json, fires automatically:

SubagentStop → cost_hook.py logs agent name + session to cost-tracking.json
Stop → logs session end
SessionStart → welcome message

Statusline shows live cost from session token data: $0.57

Commands (26)

Hunting & Analysis

Command	Description
`/hunt <target> [--vuln-class]`	Active hunting — searches writeup DB for techniques first, then tests with concrete payloads
`/autopilot <target>`	Autonomous loop with --paranoid/--normal/--yolo checkpoints
`/surface <target>`	P1/P2/Kill ranked attack surface
`/chain`	Build A→B→C exploit chains (12 patterns, 6 high-value templates)
`/analyze <target>`	AI analysis: crown jewels, attack paths, blind spots
`/mindmap <target>`	Attack surface tree with brain status
`/sast <repo>`	Source-code vulnerability hunting (entry → flow → gap → exploit pipeline)

Validation & Reporting

Command	Description
`/validate <finding>`	7-Question Gate → PASS/KILL/DOWNGRADE/CHAIN REQUIRED
`/triage`	Batch-validate ALL findings, kill weak ones
`/quality <draft>`	Score report 1-10 (blocks below 7)
`/report [format]`	Reports (hard gate: requires /validate PASS)
`/dupcheck <desc>`	Hacktivity + writeup DB for duplicates
`/submit <finding>`	Submit (hard gate: /validate PASS + /quality ≥ 7)

Session & Memory

Command	Description
`/resume <target>`	Resume — untested endpoints + suggestions
`/remember`	Log finding/pattern for cross-target learning
`/learn <id> <status>`	Record response — auto-boosts paid techniques
`/brain`	init, brief, status, endpoint, endpoints, record, exhausted

Infrastructure

Command	Description
`/new`, `/sync`, `/status`	Setup + dashboard
`/pipeline`, `/quickscan`, `/fullscan`	Scanning pipelines
`/correlate`	Chain discovery across findings
`/evidence`, `/cost`, `/monitor`	Evidence, cost, monitoring

Agents (50)

H1 Weakness Specialists (17)

xss-hunter (#60/#61/#62), sqli-hunter (#67), csrf-hunter (#57), ssrf-hunter (#75), ssti-hunter (#74), idor-hunter (#55), auth-tester (#27), info-disclosure (#18), open-redirect (#38), rce-hunter (#70), xxe-hunter (#63), file-upload (#39), cors-hunter (#58), subdomain-takeover (#145), business-logic (#28), race-condition (#29), privilege-escalation (#26)

Hunting & Analysis (3)

validator — 7-Question Gate + never-submit list (PASS/KILL/DOWNGRADE/CHAIN)
chain-builder — A→B chain table, searches writeup DB for proven chains
recon-ranker — P1/P2/Kill surface ranking

Infrastructure / Recon (10)

recon, vuln-scanner, config-auditor, cloud-recon, js-analyzer, waf-profiler, graphql-audit, nuclei-writer, browser-agent (Burp MCP), browser-stealth-agent (Camoufox)

Meta / Validation (9)

brain, correlator, quality-check, monitor, poc-builder, report-writer, scope-check, browser-verifier (client-side PoC proof), dast-devils-advocate (adversarial downgrade)

SAST Pipeline (8)

sast-file-ranker, sast-entry-mapper, sast-danger-mapper, sast-flow-tracer, sast-gap-analyzer, sast-devils-advocate, sast-hunter, sast-exploit-builder

Specialized (1)

web3-auditor — Solidity grep arsenal, Foundry PoC, DeFi patterns

CLI Tools (15)

Tool	Purpose
brain.py	Brain with endpoint tracking + circuit breaker
intel_engine.py	Hacktivity patterns + tech→vuln mapping
journal.py	JSONL session journal for /resume
target_selector.py	Program ROI ranking
cost_hook.py	CC hook: auto-logs agent completions via SubagentStop
statusline.py	Dashboard (--compact/--watch/--json)
scope_check.py	Scope validation with --list
dedup_findings.py	Dedup + hacktivity cross-reference
global_brain.py	Cross-engagement knowledge (incremental hash-based sync)
response_tracker.py	Response learning + auto-boost paid techniques
scaffold.py	Workspace scaffolding with update mode
capture.py	Screenshots + video (WSL2)
cost.py	Token cost tracking + ROI
camofox_ctl.sh	Camoufox (stealth Firefox) lifecycle — Cloudflare/Akamai bypass
pentest-statusline.sh	CC statusline: findings, brain, context, cost

Payload Database (rules/payloads.md — 545 lines)

XSS (basic + WAF bypass + context-specific + impact proof), SSRF (internal targets + IP bypass), SQLi (detection + error-based), IDOR (ID manipulation + method variation + version downgrade), OAuth (redirect_uri bypass), File Upload (extension + content-type + magic bytes), Race Conditions, SSTI (Jinja2, Twig, EJS, Velocity with filter bypass), Deserialization (pickle, PHP, Java ysoserial, Node.js), JWT (alg:none, RS256→HS256 confusion, weak secret), LFI (PHP wrappers, log poisoning→RCE, bypass filters), Prototype Pollution (detection + RCE escalation), NoSQL Injection (auth bypass + data extraction), DeFi (reentrancy, flash loan, oracle manipulation)

Key Features

Writeup search MCP: Agents query prior art during hunting — bring your own FAISS/SQLite writeup index, or fall back to the shipped payload/technique library
CC hooks: SubagentStop/Stop auto-log costs, statusline shows live $X.XX from token data
7-Question Gate: Every finding validated — first NO = KILL
Circuit breaker: 5× consecutive 403/429 → auto-backoff 60s
Endpoint tracking: Brain records every endpoint tested per target
Hard validation gates: /report and /submit refuse without /validate PASS
Never-submit filter: Pipeline auto-kills informational findings
Incremental sync: Global brain hash-based, skips unchanged files
Feedback loop: /learn auto-boosts paid techniques globally
Session journal: JSONL log for /resume continuity

Requirements

Python 3.10+, pip install mcp
Optional: pip install faiss-cpu sentence-transformers (for writeup semantic search)
Security tools: nmap, httpx, subfinder, nuclei, ffuf, katana, sqlmap
GraphQL hunter tools: graphql-path-enum — cargo install --git https://gitlab.com/dee-see/graphql-path-enum (auto-installed by setup-mcp.sh if cargo is present)
Evidence: grim/scrot, wf-recorder/ffmpeg
jq (for statusline)

License

For authorized security testing only. Follow responsible disclosure.