pentest-agents

mcp
Security Audit
Fail
Health Warn
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 16 GitHub stars
Code Fail
  • rm -rf — Recursive force deletion command in .claude/settings.json
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This tool is an autonomous bug bounty and penetration testing framework designed for AI coding assistants. It orchestrates dozens of specialist agents to hunt for vulnerabilities and interfaces directly with platforms like HackerOne and Bugcrowd.

Security Assessment
Overall Risk: High. As a penetration testing framework by design, this tool requires access to highly sensitive data and environments. It executes shell commands, makes external network requests to target URLs, and connects to bug bounty platform APIs using provided authentication tokens. A significant automated finding flags a `rm -rf` (recursive force deletion) command within the configuration files, which poses a notable risk to the local host system if triggered improperly. Additionally, there are no hardcoded secrets, but the setup requires the user to expose sensitive API tokens (e.g., HackerOne credentials) directly via local environment variables.

Quality Assessment
The project appears active and regularly maintained, with recent repository pushes and a massive codebase (39,000+ lines across 152 files). However, community trust is relatively low given its niche nature, evidenced by only 16 GitHub stars. A major concern for enterprise or open use is the complete lack of a license file, meaning the software's legal usage, modification, and distribution rights are strictly reserved and legally ambiguous.

Verdict
Use with caution — thoroughly inspect the configuration to mitigate the local deletion risk, and understand that running autonomous testing agents against targets without explicit permission may violate laws or platform terms of service.
SUMMARY

Autonomous bug-bounty framework for Claude Code — 40 specialist agents, exploit-chain builder, writeup search, and live HackerOne/Bugcrowd integration.

README.md

Pentest Agent Suite

Pentest Agent Suite for Claude Code

Autonomous bug-bounty framework for Claude Code and 6 other AI coding tools — 47 agents, 26 commands, 15 CLI tools, 2 MCP servers.

Python 3.10+ Claude Code MCP servers 47 agents Payloads 7 IDE targets


152 files · 39k+ lines · 47 agents · 26 commands · 15 CLI tools · 6 skills · 2 MCP servers (16 bug-bounty platforms + BYO writeup search) · 545 payload lines

A complete bug bounty framework. Battle-tested hunting methodology with concrete payloads, 7-Question Gate validation, autonomous hunt loops, A→B exploit chain building, persistent brain with endpoint tracking, optional semantic writeup search (bring your own index), automatic cost tracking via CC hooks, live platform integration, and a cross-IDE installer that emits the native format for Claude Code, Codex, Gemini, Cursor, Windsurf, and VS Code Copilot.

Quick Start

pip install mcp
export HACKERONE_USERNAME=you HACKERONE_TOKEN=your_token
python3 tools/scaffold.py hackerone tesla --type web-app
cd ~/bounties/hackerone-tesla && claude
/model opus             # Opus 4.6 [1M] — subagents inherit via model: "inherit"
/sync hackerone tesla
/brain init && /status
/hunt tesla.com

Install (Claude Code + 6 other AI coding tools)

pentest-agents ships a cross-IDE installer that emits each target's native
format — agents, skills, commands, rules, and MCP configuration — so the same
framework works everywhere.

# From a clone:
python3 -m tools.installer install --targets all --scope project

PyPI distribution is WIP. uv build produces a working wheel, but the
installed CLI currently resolves source files relative to a repo clone layout
(.claude/agents, .claude/skills, skills/, rules/, rules/payloads.md,
mcp-*-server/). Running via pipx install / uvx pentest-agents will
execute but install an empty manifest. Until this is fixed, run the installer
from a clone.

Target Agents Commands Rules MCP Scopes
Claude Code native .claude/agents/*.md .claude/skills/<name>/SKILL.md CLAUDE.md .mcp.json / ~/.claude.json global + project
OpenAI Codex native .codex/agents/*.toml .codex/commands/*.md AGENTS.md [mcp_servers.*] in config.toml global + project
Google Gemini native .gemini/agents/*.md TOML in .gemini/commands/ GEMINI.md mcpServers in settings.json global + project
Cursor → Skills → Skills .cursor/rules/*.mdc + AGENTS.md .cursor/mcp.json global + project
Windsurf → Skills Workflows .windsurf/rules/*.md (≤12K / file) ~/.codeium/windsurf/mcp_config.json global + project
VS Code Copilot .github/agents/*.agent.md .github/prompts/*.prompt.md .github/copilot-instructions.md + .github/instructions/* .vscode/mcp.json project + global-MCP
OpenClaw → Skills → Skills ~/.openclaw/workspace/AGENTS.md or <proj>/AGENTS.md mcp.servers in ~/.openclaw/openclaw.json global + project (skills/rules only; MCP is user-level)

Cursor, Windsurf, and OpenClaw have no native subagent concept, so Claude-format
agents are rendered as Skills for those three (the closest analogue). Every
target's rule digest is a single canonical AGENTS.md-compatible file when
supported.

OpenClaw specifics (verified against docs.openclaw.ai, April 2026):
skills install into ~/.openclaw/skills/<name>/SKILL.md (global) or
<project>/.agents/skills/<name>/SKILL.md (project — AgentSkills convention).
MCP is always wired into the user-level ~/.openclaw/openclaw.json under
mcp.servers.*; project-scope installs emit a warning reminding you to run
--scope global once if you need the MCP servers.

Management:

pentest-agents list                      # detect which targets are installed
pentest-agents install --targets claude_code,codex --scope global
pentest-agents install --dry-run         # preview every file + JSON merge
pentest-agents verify                    # check manifest vs. disk (drift)
pentest-agents uninstall                 # reverse, restore .pa-backup files

Every install records a manifest (.pentest-agents/manifest.json for project
scope, ~/.config/pentest-agents/manifest.json for global). Uninstall only
removes files we wrote and surgically strips only the MCP/JSON keys we merged —
your other settings are never touched. Conflicting writes back up the original
as <path>.pa-backup and are restored on uninstall.

Workflow

New program:   /new → /sync → /brain init → /analyze → /surface → /hunt
Returning:     /resume <target> → /hunt or /autopilot
After finding: /validate → /chain → /report → /dupcheck → /submit → /learn
Batch triage:  /triage (7-Question Gate on all findings)

MCP Servers (2)

bounty-platforms (16 platforms)

HackerOne (full API), Bugcrowd, Intigriti, Immunefi (public), YesWeHack + 11 stubs.
7 MCP tools: list_platforms, get_program_scope, get_program_policy, search_hacktivity, sync_program, draft_report, submit_report.

writeup-search (BYO index)

Searchable knowledge base agents query during hunting and validation.
4 MCP tools:

  • search_writeups — semantic search (FAISS) or keyword search for prior art
  • get_writeup — full writeup content by ID
  • search_techniques — exploitation techniques by vuln class
  • search_payloads — curated payloads from rules/payloads.md

The writeup index is not bundled. Bulk-redistributing scraped hacktivity violates most platform ToS, so this repo ships the server only. The search_payloads + search_techniques fallback works out of the box; the semantic/keyword layers activate once you point the server at your own index.

Three search modes (auto-detected, graceful fallback):

Mode Requires Searches
FAISS (semantic) faiss-cpu, sentence-transformers, your metadata.db + index.faiss Your writeup corpus via vector embeddings
SQLite (keyword) Your metadata.db only Your writeup corpus via LIKE over the text column
Local (default) Nothing — zero deps rules/payloads.md + skills/ shipped in this repo

Point the server at your index by dropping metadata.db (+ optionally index.faiss) into ~/.local/share/pentest-writeups/, or set WRITEUP_DB_DIR=/path/to/dir.

Expected schema (metadata.db): a SQLite file with at least one table containing columns id, title, url, and one text column (content / text / body / writeup). Row order in the table must match vector order in index.faiss when using semantic mode.

Build your own index — rag-builder/

The repo now ships a local RAG/FAISS builder under rag-builder/ that turns a list of GitHub / GitLab repositories into a metadata.db + index.faiss pair the writeup-search MCP server consumes. Destructive operations (clone, embed, write) are always gated behind --execute — running the CLI without it prints the plan and changes nothing, so you can never wipe an existing index by accident.

cd rag-builder

# 1. Inspect the plan — no network, no writes.
python3 build.py status
python3 build.py ingest                    # dry-run (the default)

# 2. Opt-in pre-flight: probe every URL with `git ls-remote` (network).
python3 build.py ingest --check-remotes    # ~5s for 141 repos at 16 workers

# 3. Actually clone + index every repo from repos.yaml into ./data/.
python3 build.py ingest --execute
python3 build.py ingest --execute --check-remotes   # skip unreachable first

# 4. Point the MCP server at the output.
export WRITEUP_DB_DIR="$PWD/data"
python3 ../mcp-writeup-server/server.py --test

rag-builder/repos.yaml ships with a 146-entry seed covering CTF archives, bug-bounty reports, payload collections, and research aggregators — edit freely. repos-skipped.yaml is loaded automatically as an exclusion list (override with --skip-list or --no-skip-list). config.yaml controls the embedding model (all-MiniLM-L6-v2 by default), host allowlist, clone size cap, and file-size ceiling. See rag-builder/README.md for the full reference.

CC Hooks (automatic cost tracking)

Configured in settings.json, fires automatically:

  • SubagentStopcost_hook.py logs agent name + session to cost-tracking.json
  • Stop → logs session end
  • SessionStart → welcome message

Statusline shows live cost from session token data: $0.57

Commands (26)

Hunting & Analysis

Command Description
/hunt <target> [--vuln-class] Active hunting — searches writeup DB for techniques first, then tests with concrete payloads
/autopilot <target> Autonomous loop with --paranoid/--normal/--yolo checkpoints
/surface <target> P1/P2/Kill ranked attack surface
/chain Build A→B→C exploit chains (12 patterns, 6 high-value templates)
/analyze <target> AI analysis: crown jewels, attack paths, blind spots
/mindmap <target> Attack surface tree with brain status
/sast <repo> Source-code vulnerability hunting (entry → flow → gap → exploit pipeline)

Validation & Reporting

Command Description
/validate <finding> 7-Question Gate → PASS/KILL/DOWNGRADE/CHAIN REQUIRED
/triage Batch-validate ALL findings, kill weak ones
/quality <draft> Score report 1-10 (blocks below 7)
/report [format] Reports (hard gate: requires /validate PASS)
/dupcheck <desc> Hacktivity + writeup DB for duplicates
/submit <finding> Submit (hard gate: /validate PASS + /quality ≥ 7)

Session & Memory

Command Description
/resume <target> Resume — untested endpoints + suggestions
/remember Log finding/pattern for cross-target learning
/learn <id> <status> Record response — auto-boosts paid techniques
/brain init, brief, status, endpoint, endpoints, record, exhausted

Infrastructure

Command Description
/new, /sync, /status Setup + dashboard
/pipeline, /quickscan, /fullscan Scanning pipelines
/correlate Chain discovery across findings
/evidence, /cost, /monitor Evidence, cost, monitoring

Agents (50)

H1 Weakness Specialists (17)

xss-hunter (#60/#61/#62), sqli-hunter (#67), csrf-hunter (#57), ssrf-hunter (#75), ssti-hunter (#74), idor-hunter (#55), auth-tester (#27), info-disclosure (#18), open-redirect (#38), rce-hunter (#70), xxe-hunter (#63), file-upload (#39), cors-hunter (#58), subdomain-takeover (#145), business-logic (#28), race-condition (#29), privilege-escalation (#26)

Hunting & Analysis (3)

  • validator — 7-Question Gate + never-submit list (PASS/KILL/DOWNGRADE/CHAIN)
  • chain-builder — A→B chain table, searches writeup DB for proven chains
  • recon-ranker — P1/P2/Kill surface ranking

Infrastructure / Recon (10)

recon, vuln-scanner, config-auditor, cloud-recon, js-analyzer, waf-profiler, graphql-audit, nuclei-writer, browser-agent (Burp MCP), browser-stealth-agent (Camoufox)

Meta / Validation (9)

brain, correlator, quality-check, monitor, poc-builder, report-writer, scope-check, browser-verifier (client-side PoC proof), dast-devils-advocate (adversarial downgrade)

SAST Pipeline (8)

sast-file-ranker, sast-entry-mapper, sast-danger-mapper, sast-flow-tracer, sast-gap-analyzer, sast-devils-advocate, sast-hunter, sast-exploit-builder

Specialized (1)

web3-auditor — Solidity grep arsenal, Foundry PoC, DeFi patterns

CLI Tools (15)

Tool Purpose
brain.py Brain with endpoint tracking + circuit breaker
intel_engine.py Hacktivity patterns + tech→vuln mapping
journal.py JSONL session journal for /resume
target_selector.py Program ROI ranking
cost_hook.py CC hook: auto-logs agent completions via SubagentStop
statusline.py Dashboard (--compact/--watch/--json)
scope_check.py Scope validation with --list
dedup_findings.py Dedup + hacktivity cross-reference
global_brain.py Cross-engagement knowledge (incremental hash-based sync)
response_tracker.py Response learning + auto-boost paid techniques
scaffold.py Workspace scaffolding with update mode
capture.py Screenshots + video (WSL2)
cost.py Token cost tracking + ROI
camofox_ctl.sh Camoufox (stealth Firefox) lifecycle — Cloudflare/Akamai bypass
pentest-statusline.sh CC statusline: findings, brain, context, cost

Payload Database (rules/payloads.md — 545 lines)

XSS (basic + WAF bypass + context-specific + impact proof), SSRF (internal targets + IP bypass), SQLi (detection + error-based), IDOR (ID manipulation + method variation + version downgrade), OAuth (redirect_uri bypass), File Upload (extension + content-type + magic bytes), Race Conditions, SSTI (Jinja2, Twig, EJS, Velocity with filter bypass), Deserialization (pickle, PHP, Java ysoserial, Node.js), JWT (alg:none, RS256→HS256 confusion, weak secret), LFI (PHP wrappers, log poisoning→RCE, bypass filters), Prototype Pollution (detection + RCE escalation), NoSQL Injection (auth bypass + data extraction), DeFi (reentrancy, flash loan, oracle manipulation)

Key Features

  • Writeup search MCP: Agents query prior art during hunting — bring your own FAISS/SQLite writeup index, or fall back to the shipped payload/technique library
  • CC hooks: SubagentStop/Stop auto-log costs, statusline shows live $X.XX from token data
  • 7-Question Gate: Every finding validated — first NO = KILL
  • Circuit breaker: 5× consecutive 403/429 → auto-backoff 60s
  • Endpoint tracking: Brain records every endpoint tested per target
  • Hard validation gates: /report and /submit refuse without /validate PASS
  • Never-submit filter: Pipeline auto-kills informational findings
  • Incremental sync: Global brain hash-based, skips unchanged files
  • Feedback loop: /learn auto-boosts paid techniques globally
  • Session journal: JSONL log for /resume continuity

Requirements

  • Python 3.10+, pip install mcp
  • Optional: pip install faiss-cpu sentence-transformers (for writeup semantic search)
  • Security tools: nmap, httpx, subfinder, nuclei, ffuf, katana, sqlmap
  • GraphQL hunter tools: graphql-path-enumcargo install --git https://gitlab.com/dee-see/graphql-path-enum (auto-installed by setup-mcp.sh if cargo is present)
  • Evidence: grim/scrot, wf-recorder/ffmpeg
  • jq (for statusline)

License

For authorized security testing only. Follow responsible disclosure.

Reviews (0)

No results found