pentest-agents
Health Warn
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 16 GitHub stars
Code Fail
- rm -rf — Recursive force deletion command in .claude/settings.json
Permissions Pass
- Permissions — No dangerous permissions requested
This tool is an autonomous bug bounty and penetration testing framework designed for AI coding assistants. It orchestrates dozens of specialist agents to hunt for vulnerabilities and interfaces directly with platforms like HackerOne and Bugcrowd.
Security Assessment
Overall Risk: High. As a penetration testing framework by design, this tool requires access to highly sensitive data and environments. It executes shell commands, makes external network requests to target URLs, and connects to bug bounty platform APIs using provided authentication tokens. A significant automated finding flags a `rm -rf` (recursive force deletion) command within the configuration files, which poses a notable risk to the local host system if triggered improperly. Additionally, there are no hardcoded secrets, but the setup requires the user to expose sensitive API tokens (e.g., HackerOne credentials) directly via local environment variables.
Quality Assessment
The project appears active and regularly maintained, with recent repository pushes and a massive codebase (39,000+ lines across 152 files). However, community trust is relatively low given its niche nature, evidenced by only 16 GitHub stars. A major concern for enterprise or open use is the complete lack of a license file, meaning the software's legal usage, modification, and distribution rights are strictly reserved and legally ambiguous.
Verdict
Use with caution — thoroughly inspect the configuration to mitigate the local deletion risk, and understand that running autonomous testing agents against targets without explicit permission may violate laws or platform terms of service.
Autonomous bug-bounty framework for Claude Code — 40 specialist agents, exploit-chain builder, writeup search, and live HackerOne/Bugcrowd integration.
Pentest Agent Suite for Claude Code
Autonomous bug-bounty framework for Claude Code and 6 other AI coding tools — 47 agents, 26 commands, 15 CLI tools, 2 MCP servers.
152 files · 39k+ lines · 47 agents · 26 commands · 15 CLI tools · 6 skills · 2 MCP servers (16 bug-bounty platforms + BYO writeup search) · 545 payload lines
A complete bug bounty framework. Battle-tested hunting methodology with concrete payloads, 7-Question Gate validation, autonomous hunt loops, A→B exploit chain building, persistent brain with endpoint tracking, optional semantic writeup search (bring your own index), automatic cost tracking via CC hooks, live platform integration, and a cross-IDE installer that emits the native format for Claude Code, Codex, Gemini, Cursor, Windsurf, and VS Code Copilot.
Quick Start
pip install mcp
export HACKERONE_USERNAME=you HACKERONE_TOKEN=your_token
python3 tools/scaffold.py hackerone tesla --type web-app
cd ~/bounties/hackerone-tesla && claude
/model opus # Opus 4.6 [1M] — subagents inherit via model: "inherit"
/sync hackerone tesla
/brain init && /status
/hunt tesla.com
Install (Claude Code + 6 other AI coding tools)
pentest-agents ships a cross-IDE installer that emits each target's native
format — agents, skills, commands, rules, and MCP configuration — so the same
framework works everywhere.
# From a clone:
python3 -m tools.installer install --targets all --scope project
PyPI distribution is WIP.
uv buildproduces a working wheel, but the
installed CLI currently resolves source files relative to a repo clone layout
(.claude/agents,.claude/skills,skills/,rules/,rules/payloads.md,mcp-*-server/). Running viapipx install/uvx pentest-agentswill
execute but install an empty manifest. Until this is fixed, run the installer
from a clone.
| Target | Agents | Commands | Rules | MCP | Scopes |
|---|---|---|---|---|---|
| Claude Code | native .claude/agents/*.md |
.claude/skills/<name>/SKILL.md |
CLAUDE.md |
.mcp.json / ~/.claude.json |
global + project |
| OpenAI Codex | native .codex/agents/*.toml |
.codex/commands/*.md |
AGENTS.md |
[mcp_servers.*] in config.toml |
global + project |
| Google Gemini | native .gemini/agents/*.md |
TOML in .gemini/commands/ |
GEMINI.md |
mcpServers in settings.json |
global + project |
| Cursor | → Skills | → Skills | .cursor/rules/*.mdc + AGENTS.md |
.cursor/mcp.json |
global + project |
| Windsurf | → Skills | Workflows | .windsurf/rules/*.md (≤12K / file) |
~/.codeium/windsurf/mcp_config.json |
global + project |
| VS Code Copilot | .github/agents/*.agent.md |
.github/prompts/*.prompt.md |
.github/copilot-instructions.md + .github/instructions/* |
.vscode/mcp.json |
project + global-MCP |
| OpenClaw | → Skills | → Skills | ~/.openclaw/workspace/AGENTS.md or <proj>/AGENTS.md |
mcp.servers in ~/.openclaw/openclaw.json |
global + project (skills/rules only; MCP is user-level) |
Cursor, Windsurf, and OpenClaw have no native subagent concept, so Claude-format
agents are rendered as Skills for those three (the closest analogue). Every
target's rule digest is a single canonical AGENTS.md-compatible file when
supported.
OpenClaw specifics (verified against docs.openclaw.ai, April 2026):
skills install into ~/.openclaw/skills/<name>/SKILL.md (global) or<project>/.agents/skills/<name>/SKILL.md (project — AgentSkills convention).
MCP is always wired into the user-level ~/.openclaw/openclaw.json undermcp.servers.*; project-scope installs emit a warning reminding you to run--scope global once if you need the MCP servers.
Management:
pentest-agents list # detect which targets are installed
pentest-agents install --targets claude_code,codex --scope global
pentest-agents install --dry-run # preview every file + JSON merge
pentest-agents verify # check manifest vs. disk (drift)
pentest-agents uninstall # reverse, restore .pa-backup files
Every install records a manifest (.pentest-agents/manifest.json for project
scope, ~/.config/pentest-agents/manifest.json for global). Uninstall only
removes files we wrote and surgically strips only the MCP/JSON keys we merged —
your other settings are never touched. Conflicting writes back up the original
as <path>.pa-backup and are restored on uninstall.
Workflow
New program: /new → /sync → /brain init → /analyze → /surface → /hunt
Returning: /resume <target> → /hunt or /autopilot
After finding: /validate → /chain → /report → /dupcheck → /submit → /learn
Batch triage: /triage (7-Question Gate on all findings)
MCP Servers (2)
bounty-platforms (16 platforms)
HackerOne (full API), Bugcrowd, Intigriti, Immunefi (public), YesWeHack + 11 stubs.
7 MCP tools: list_platforms, get_program_scope, get_program_policy, search_hacktivity, sync_program, draft_report, submit_report.
writeup-search (BYO index)
Searchable knowledge base agents query during hunting and validation.
4 MCP tools:
search_writeups— semantic search (FAISS) or keyword search for prior artget_writeup— full writeup content by IDsearch_techniques— exploitation techniques by vuln classsearch_payloads— curated payloads fromrules/payloads.md
The writeup index is not bundled. Bulk-redistributing scraped hacktivity violates most platform ToS, so this repo ships the server only. The
search_payloads+search_techniquesfallback works out of the box; the semantic/keyword layers activate once you point the server at your own index.
Three search modes (auto-detected, graceful fallback):
| Mode | Requires | Searches |
|---|---|---|
| FAISS (semantic) | faiss-cpu, sentence-transformers, your metadata.db + index.faiss |
Your writeup corpus via vector embeddings |
| SQLite (keyword) | Your metadata.db only |
Your writeup corpus via LIKE over the text column |
| Local (default) | Nothing — zero deps | rules/payloads.md + skills/ shipped in this repo |
Point the server at your index by dropping metadata.db (+ optionally index.faiss) into ~/.local/share/pentest-writeups/, or set WRITEUP_DB_DIR=/path/to/dir.
Expected schema (metadata.db): a SQLite file with at least one table containing columns id, title, url, and one text column (content / text / body / writeup). Row order in the table must match vector order in index.faiss when using semantic mode.
Build your own index — rag-builder/
The repo now ships a local RAG/FAISS builder under rag-builder/ that turns a list of GitHub / GitLab repositories into a metadata.db + index.faiss pair the writeup-search MCP server consumes. Destructive operations (clone, embed, write) are always gated behind --execute — running the CLI without it prints the plan and changes nothing, so you can never wipe an existing index by accident.
cd rag-builder
# 1. Inspect the plan — no network, no writes.
python3 build.py status
python3 build.py ingest # dry-run (the default)
# 2. Opt-in pre-flight: probe every URL with `git ls-remote` (network).
python3 build.py ingest --check-remotes # ~5s for 141 repos at 16 workers
# 3. Actually clone + index every repo from repos.yaml into ./data/.
python3 build.py ingest --execute
python3 build.py ingest --execute --check-remotes # skip unreachable first
# 4. Point the MCP server at the output.
export WRITEUP_DB_DIR="$PWD/data"
python3 ../mcp-writeup-server/server.py --test
rag-builder/repos.yaml ships with a 146-entry seed covering CTF archives, bug-bounty reports, payload collections, and research aggregators — edit freely. repos-skipped.yaml is loaded automatically as an exclusion list (override with --skip-list or --no-skip-list). config.yaml controls the embedding model (all-MiniLM-L6-v2 by default), host allowlist, clone size cap, and file-size ceiling. See rag-builder/README.md for the full reference.
CC Hooks (automatic cost tracking)
Configured in settings.json, fires automatically:
- SubagentStop →
cost_hook.pylogs agent name + session tocost-tracking.json - Stop → logs session end
- SessionStart → welcome message
Statusline shows live cost from session token data: $0.57
Commands (26)
Hunting & Analysis
| Command | Description |
|---|---|
/hunt <target> [--vuln-class] |
Active hunting — searches writeup DB for techniques first, then tests with concrete payloads |
/autopilot <target> |
Autonomous loop with --paranoid/--normal/--yolo checkpoints |
/surface <target> |
P1/P2/Kill ranked attack surface |
/chain |
Build A→B→C exploit chains (12 patterns, 6 high-value templates) |
/analyze <target> |
AI analysis: crown jewels, attack paths, blind spots |
/mindmap <target> |
Attack surface tree with brain status |
/sast <repo> |
Source-code vulnerability hunting (entry → flow → gap → exploit pipeline) |
Validation & Reporting
| Command | Description |
|---|---|
/validate <finding> |
7-Question Gate → PASS/KILL/DOWNGRADE/CHAIN REQUIRED |
/triage |
Batch-validate ALL findings, kill weak ones |
/quality <draft> |
Score report 1-10 (blocks below 7) |
/report [format] |
Reports (hard gate: requires /validate PASS) |
/dupcheck <desc> |
Hacktivity + writeup DB for duplicates |
/submit <finding> |
Submit (hard gate: /validate PASS + /quality ≥ 7) |
Session & Memory
| Command | Description |
|---|---|
/resume <target> |
Resume — untested endpoints + suggestions |
/remember |
Log finding/pattern for cross-target learning |
/learn <id> <status> |
Record response — auto-boosts paid techniques |
/brain |
init, brief, status, endpoint, endpoints, record, exhausted |
Infrastructure
| Command | Description |
|---|---|
/new, /sync, /status |
Setup + dashboard |
/pipeline, /quickscan, /fullscan |
Scanning pipelines |
/correlate |
Chain discovery across findings |
/evidence, /cost, /monitor |
Evidence, cost, monitoring |
Agents (50)
H1 Weakness Specialists (17)
xss-hunter (#60/#61/#62), sqli-hunter (#67), csrf-hunter (#57), ssrf-hunter (#75), ssti-hunter (#74), idor-hunter (#55), auth-tester (#27), info-disclosure (#18), open-redirect (#38), rce-hunter (#70), xxe-hunter (#63), file-upload (#39), cors-hunter (#58), subdomain-takeover (#145), business-logic (#28), race-condition (#29), privilege-escalation (#26)
Hunting & Analysis (3)
- validator — 7-Question Gate + never-submit list (PASS/KILL/DOWNGRADE/CHAIN)
- chain-builder — A→B chain table, searches writeup DB for proven chains
- recon-ranker — P1/P2/Kill surface ranking
Infrastructure / Recon (10)
recon, vuln-scanner, config-auditor, cloud-recon, js-analyzer, waf-profiler, graphql-audit, nuclei-writer, browser-agent (Burp MCP), browser-stealth-agent (Camoufox)
Meta / Validation (9)
brain, correlator, quality-check, monitor, poc-builder, report-writer, scope-check, browser-verifier (client-side PoC proof), dast-devils-advocate (adversarial downgrade)
SAST Pipeline (8)
sast-file-ranker, sast-entry-mapper, sast-danger-mapper, sast-flow-tracer, sast-gap-analyzer, sast-devils-advocate, sast-hunter, sast-exploit-builder
Specialized (1)
web3-auditor — Solidity grep arsenal, Foundry PoC, DeFi patterns
CLI Tools (15)
| Tool | Purpose |
|---|---|
| brain.py | Brain with endpoint tracking + circuit breaker |
| intel_engine.py | Hacktivity patterns + tech→vuln mapping |
| journal.py | JSONL session journal for /resume |
| target_selector.py | Program ROI ranking |
| cost_hook.py | CC hook: auto-logs agent completions via SubagentStop |
| statusline.py | Dashboard (--compact/--watch/--json) |
| scope_check.py | Scope validation with --list |
| dedup_findings.py | Dedup + hacktivity cross-reference |
| global_brain.py | Cross-engagement knowledge (incremental hash-based sync) |
| response_tracker.py | Response learning + auto-boost paid techniques |
| scaffold.py | Workspace scaffolding with update mode |
| capture.py | Screenshots + video (WSL2) |
| cost.py | Token cost tracking + ROI |
| camofox_ctl.sh | Camoufox (stealth Firefox) lifecycle — Cloudflare/Akamai bypass |
| pentest-statusline.sh | CC statusline: findings, brain, context, cost |
Payload Database (rules/payloads.md — 545 lines)
XSS (basic + WAF bypass + context-specific + impact proof), SSRF (internal targets + IP bypass), SQLi (detection + error-based), IDOR (ID manipulation + method variation + version downgrade), OAuth (redirect_uri bypass), File Upload (extension + content-type + magic bytes), Race Conditions, SSTI (Jinja2, Twig, EJS, Velocity with filter bypass), Deserialization (pickle, PHP, Java ysoserial, Node.js), JWT (alg:none, RS256→HS256 confusion, weak secret), LFI (PHP wrappers, log poisoning→RCE, bypass filters), Prototype Pollution (detection + RCE escalation), NoSQL Injection (auth bypass + data extraction), DeFi (reentrancy, flash loan, oracle manipulation)
Key Features
- Writeup search MCP: Agents query prior art during hunting — bring your own FAISS/SQLite writeup index, or fall back to the shipped payload/technique library
- CC hooks: SubagentStop/Stop auto-log costs, statusline shows live
$X.XXfrom token data - 7-Question Gate: Every finding validated — first NO = KILL
- Circuit breaker: 5× consecutive 403/429 → auto-backoff 60s
- Endpoint tracking: Brain records every endpoint tested per target
- Hard validation gates: /report and /submit refuse without /validate PASS
- Never-submit filter: Pipeline auto-kills informational findings
- Incremental sync: Global brain hash-based, skips unchanged files
- Feedback loop: /learn auto-boosts paid techniques globally
- Session journal: JSONL log for /resume continuity
Requirements
- Python 3.10+,
pip install mcp - Optional:
pip install faiss-cpu sentence-transformers(for writeup semantic search) - Security tools: nmap, httpx, subfinder, nuclei, ffuf, katana, sqlmap
- GraphQL hunter tools:
graphql-path-enum—cargo install --git https://gitlab.com/dee-see/graphql-path-enum(auto-installed bysetup-mcp.shifcargois present) - Evidence: grim/scrot, wf-recorder/ffmpeg
- jq (for statusline)
License
For authorized security testing only. Follow responsible disclosure.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found