pentest-ai

agent
SUMMARY

Turn Claude Code into your offensive security research assistant. Specialized AI subagents for authorized penetration testing plan engagements, analyze recon, research exploits, build detections, audit STIGs, and write reports.

README.md
pentest-ai

pentest-ai

Turn Claude Code into your offensive security research assistant.

17 specialized AI subagents for every phase of authorized penetration testing, from scoping to reporting. MITRE ATT&CK mapped. Dual offensive/defensive perspective in every response.

License: MIT
Claude Code
MITRE ATT&CK
Agents
Platform: Linux
Platform: macOS
Platform: Windows WSL
GitHub stars
GitHub forks
Last Commit
GitHub issues

Getting Started | Agents | Examples | Documentation | Landing Page


Table of Contents


What Is This?

pentest-ai is a collection of Claude Code subagents: specialized AI assistants that activate automatically based on what you're working on. Ask Claude to plan a pentest, and the engagement planner agent takes over. Paste Nmap output, and the recon advisor analyzes it. Each agent carries deep domain knowledge in offensive security methodology, MITRE ATT&CK mappings, and industry-standard frameworks.

You don't need to be an expert to use these agents. They communicate at whatever level you need, from explaining what Kerberoasting is to providing exact Impacket command syntax for a senior operator.

How It Works

  1. Install the agent files into your Claude Code agents directory
  2. Open Claude Code and describe your task naturally
  3. Claude automatically routes to the right specialist agent

No configuration, no commands to memorize. Just describe what you need.

Quick Demo: Advisory Mode (Tier 1)

$ claude

You: Analyze this Nmap scan and prioritize targets for our internal pentest

▸ Routing to recon-advisor agent...

## Prioritized Target Summary

### Critical Priority (Exploit Immediately)
| Host       | Port | Service | Finding                  |
|------------|------|---------|--------------------------|
| 10.10.1.5  | 445  | SMB     | SMBv1 enabled, MS17-010  |
| 10.10.1.20 | 3389 | RDP     | BlueKeep (CVE-2019-0708) |

### Recommended Next Steps
1. nmap -sV --script smb-vuln* 10.10.1.5
2. crackmapexec smb 10.10.1.0/24
3. bloodhound-python -d corp.local

Quick Demo: Execution Mode (Tier 2)

$ claude

You: My authorized scope is 10.10.1.0/24. Internal network pentest.
     Run a service scan on the subnet.

▸ Routing to recon-advisor agent...

Scope recorded: 10.10.1.0/24 (internal)

I'll start with host discovery, then do targeted service detection
on live hosts.

[MODERATE] Host discovery scan:
  nmap -sn 10.10.1.0/24 -oN nmap_10.10.1.0_discovery_20260330_140000.txt

▸ Allow Bash command? [y/n]

Found 47 live hosts. Running service detection on the top ports...

[MODERATE] Service scan on live hosts:
  nmap -sT -sV -sC --top-ports 1000 --min-rate 100 --max-rate 1000
    --host-timeout 300s -iL live_hosts.txt
    -oN nmap_10.10.1.0_services_20260330_140215.txt

▸ Allow Bash command? [y/n]

## Prioritized Target Summary
| Priority | Host       | Port | Service        | Finding               |
|----------|------------|------|----------------|-----------------------|
| Critical | 10.10.1.5  | 445  | SMB 1.0        | MS17-010 likely       |
| Critical | 10.10.1.20 | 3389 | RDP 10.0.17763 | BlueKeep candidate    |
| High     | 10.10.1.12 | 8080 | Jenkins 2.289  | Pre-auth RCE range    |

Evidence saved. Want me to run SMB vulnerability scripts on 10.10.1.5 next?

Agents

Offensive Operations

Agent What It Does Example Prompt
Engagement Planner Plans penetration tests with phased methodology, MITRE ATT&CK technique mapping, time estimates, and rules of engagement templates "Plan an internal network pentest for a 500-endpoint Active Directory environment with a 2-week window"
Recon Advisor Parses output from Nmap, Nessus, BloodHound, and 20+ tools. Prioritizes targets, maps CVEs, and recommends specific next commands. Tier 2: can execute recon tools directly with your approval. "Scan 10.10.1.0/24 and tell me what to hit first"
OSINT Collector Open source intelligence gathering: domain recon, email harvesting, social media profiling, breach data analysis, and infrastructure mapping "Build an OSINT profile on this target domain before our external engagement"
Exploit Guide Detailed exploitation methodology covering AD attacks, web apps, cloud, and post-exploitation. Every technique includes the defensive perspective "Walk me through AS-REP Roasting and how defenders detect it"
Privilege Escalation Systematic Linux and Windows privilege escalation methodology. SUID abuse, token impersonation, service exploitation, kernel exploits, and container escape "Here's my linpeas output, what's the fastest path to root?"
Cloud Security AWS, Azure, and GCP penetration testing methodology. IAM privilege escalation, container escape, serverless exploitation, and cloud-native attack paths "I have read-only AWS access with this IAM policy. Find privilege escalation paths"
API Security REST, GraphQL, and WebSocket security testing. OWASP API Top 10, JWT attacks, OAuth exploitation, BOLA/BFLA testing, and API discovery "Test this API for BOLA. Here's the Swagger doc and a valid JWT"
Mobile Pentester Android and iOS application security testing. APK/IPA analysis, Frida hooking, SSL pinning bypass, OWASP MASTG/MASVS methodology "Decompile this APK and check for hardcoded secrets and certificate pinning"
Wireless Pentester WiFi and Bluetooth penetration testing. WPA/WPA2/WPA3 attacks, evil twin, rogue AP, enterprise wireless, and Bluetooth security "Capture a WPA2 handshake and set up an evil twin for this corporate network"
Social Engineer Phishing campaigns, pretexting, vishing, physical social engineering, and security awareness assessments for authorized red team engagements "Design a phishing campaign for this engagement using GoPhish"

Defense & Analysis

Agent What It Does Example Prompt
Detection Engineer Produces deployment-ready detection rules in Sigma, Splunk SPL, Elastic KQL, and Sentinel KQL with false positive tuning guidance "Create a detection rule for DCSync with Sigma and Splunk SPL"
Threat Modeler STRIDE/DREAD threat modeling, attack tree construction, data flow analysis, and architecture-specific threat enumeration "Build a STRIDE threat model for our microservices API gateway"
Forensics Analyst Digital forensics and incident response. Evidence acquisition, memory forensics, disk analysis, timeline construction, and chain of custody "Walk me through a Volatility 3 workflow for this memory dump"
Malware Analyst Binary analysis, reverse engineering, sandbox methodology, YARA rule writing, and IOC extraction "Analyze this suspicious PE file. Start with static analysis then walk me through Ghidra"
STIG Analyst DISA STIG compliance analysis with GPO remediation paths, risk scores, verification commands, and keep-open justification templates "Analyze V-220768, what breaks if I apply it, and write a keep-open justification"

Reporting & Learning

Agent What It Does Example Prompt
Report Generator Transforms raw findings into professional pentest reports with executive summaries, CVSS scoring, evidence formatting, and remediation roadmaps "Compile these 12 findings into a professional report with an executive summary"
CTF Solver Methodical challenge-solving partner for HackTheBox, TryHackMe, and competitive CTFs. Covers web exploitation, binary exploitation, reverse engineering, cryptography, forensics, and OSINT "I'm stuck on this HackTheBox machine. I have a low-priv shell. Help me enumerate for privesc"

Agent Capabilities at a Glance

OFFENSIVE OPERATIONS
engagement-planner ── PTES, OWASP, NIST 800-115, MITRE ATT&CK
                      Rules of engagement templates
                      Phased methodology with time estimates

recon-advisor ─────── Nmap, Nessus, BloodHound, masscan, Shodan + 20 more
                      CVE mapping and attack surface prioritization
                      Specific follow-up commands for each finding

osint-collector ───── Subfinder, Amass, theHarvester, Sherlock, Shodan
                      Domain, email, identity, and organization intelligence
                      Passive vs active classification with OPSEC notes

exploit-guide ─────── Active Directory (Kerberoasting, DCSync, delegation attacks)
                      Web apps (OWASP Top 10, API security, deserialization)
                      Cloud (AWS, Azure, GCP privilege escalation)
                      MANDATORY defensive perspective for every technique

privesc-advisor ───── Linux (SUID, capabilities, cron, kernel exploits)
                      Windows (tokens, services, UAC bypass, DLL hijacking)
                      GTFOBins and LOLBAS reference for every binary

cloud-security ────── AWS (Pacu, ScoutSuite), Azure (ROADtools, AzureHound), GCP
                      IAM privilege escalation and role chaining
                      Container escape and Kubernetes attacks

api-security ──────── OWASP API Top 10 (2023) full coverage
                      JWT, OAuth 2.0, GraphQL, WebSocket testing
                      BOLA/BFLA methodology with HTTP request examples

mobile-pentester ──── Android (jadx, Frida, Drozer, apktool)
                      iOS (class-dump, Objection, Cycript, lldb)
                      OWASP MASTG/MASVS compliance mapping

wireless-pentester ── WPA/WPA2/WPA3, WPS, PMKID, KRACK
                      Evil twin, rogue AP, enterprise 802.1X attacks
                      Bluetooth Classic and BLE security testing

social-engineer ───── GoPhish, King Phisher, Evilginx2
                      Phishing, vishing, SMiShing, physical SE
                      Pretexting frameworks and campaign metrics

DEFENSE & ANALYSIS
detection-engineer ── Sigma, Splunk SPL, Elastic KQL, Sentinel KQL, YARA
                      False positive analysis and tuning guidance
                      Threat hunting hypotheses and queries

threat-modeler ────── STRIDE and DREAD analysis frameworks
                      Attack tree construction and data flow diagrams
                      Architecture-specific threat enumeration

forensics-analyst ─── Volatility 2/3, Autopsy, Sleuth Kit, Plaso
                      Memory, disk, network, and cloud forensics
                      Timeline analysis and chain of custody

malware-analyst ───── IDA Pro, Ghidra, x64dbg, Radare2
                      Static/dynamic analysis and sandbox methodology
                      YARA rule writing and IOC extraction

stig-analyst ──────── Windows, Linux, AD, Network, VMware, Application STIGs
                      GPO remediation with exact registry paths
                      Keep-open justification templates for auditors

REPORTING & LEARNING
report-generator ──── PTES/OWASP/SANS report format
                      Executive summaries for non-technical leadership
                      CVSS v3.1 scoring and CWE mapping
                      Remediation roadmaps with priority timelines

ctf-solver ────────── HackTheBox, TryHackMe, PicoCTF, OverTheWire
                      Web, Pwn, Rev, Crypto, Forensics, OSINT
                      Methodology-first guidance with learning focus

Workflow

Chain agents together for a complete engagement workflow:

graph LR
    A[OSINT] -->|osint-collector| B[Scope & Plan]
    B -->|engagement-planner| C[Reconnaissance]
    C -->|recon-advisor| D[Attack Vectors]
    D -->|exploit-guide| E[Exploitation]
    E -->|privesc-advisor| F[Escalation]
    F -->|detection-engineer| G[Detection Rules]
    F -->|report-generator| H[Final Report]
    G --> H

    style A fill:#1a1a2e,stroke:#e94560,color:#fff
    style B fill:#1a1a2e,stroke:#e94560,color:#fff
    style C fill:#1a1a2e,stroke:#e94560,color:#fff
    style D fill:#1a1a2e,stroke:#e94560,color:#fff
    style E fill:#1a1a2e,stroke:#e94560,color:#fff
    style F fill:#1a1a2e,stroke:#e94560,color:#fff
    style G fill:#1a1a2e,stroke:#e94560,color:#fff
    style H fill:#1a1a2e,stroke:#e94560,color:#fff

Architecture

graph TD
    User[You] --> Claude[Claude Code]
    Claude -->|"Gather OSINT"| OC[OSINT Collector]
    Claude -->|"Plan a pentest"| EP[Engagement Planner]
    Claude -->|"Analyze this scan"| RA[Recon Advisor]
    Claude -->|"How do I exploit X?"| EG[Exploit Guide]
    Claude -->|"Escalate privileges"| PE[Privesc Advisor]
    Claude -->|"Test this cloud env"| CL[Cloud Security]
    Claude -->|"Test this API"| AP[API Security]
    Claude -->|"Test this app"| MP[Mobile Pentester]
    Claude -->|"Test WiFi security"| WP[Wireless Pentester]
    Claude -->|"Plan phishing campaign"| SE[Social Engineer]
    Claude -->|"Model threats"| TM[Threat Modeler]
    Claude -->|"Build a detection rule"| DE[Detection Engineer]
    Claude -->|"Analyze this binary"| MA[Malware Analyst]
    Claude -->|"Forensic analysis"| FA[Forensics Analyst]
    Claude -->|"Check this STIG"| SA[STIG Analyst]
    Claude -->|"Write the report"| RG[Report Generator]
    Claude -->|"Solve this CTF"| CS[CTF Solver]

    OC --> KB[Knowledge Base]
    EP --> KB
    RA --> KB
    EG --> KB
    PE --> KB
    CL --> KB
    AP --> KB
    MP --> KB
    WP --> KB
    SE --> KB
    TM --> KB
    DE --> KB
    MA --> KB
    FA --> KB
    SA --> KB
    RG --> KB
    CS --> KB

    style User fill:#e94560,stroke:#e94560,color:#fff
    style Claude fill:#0f3460,stroke:#e94560,color:#fff
    style KB fill:#533483,stroke:#e94560,color:#fff
    style OC fill:#1a1a2e,stroke:#e94560,color:#fff
    style EP fill:#1a1a2e,stroke:#e94560,color:#fff
    style RA fill:#1a1a2e,stroke:#e94560,color:#fff
    style EG fill:#1a1a2e,stroke:#e94560,color:#fff
    style PE fill:#1a1a2e,stroke:#e94560,color:#fff
    style CL fill:#1a1a2e,stroke:#e94560,color:#fff
    style AP fill:#1a1a2e,stroke:#e94560,color:#fff
    style MP fill:#1a1a2e,stroke:#e94560,color:#fff
    style WP fill:#1a1a2e,stroke:#e94560,color:#fff
    style SE fill:#1a1a2e,stroke:#e94560,color:#fff
    style TM fill:#1a1a2e,stroke:#e94560,color:#fff
    style DE fill:#1a1a2e,stroke:#e94560,color:#fff
    style MA fill:#1a1a2e,stroke:#e94560,color:#fff
    style FA fill:#1a1a2e,stroke:#e94560,color:#fff
    style SA fill:#1a1a2e,stroke:#e94560,color:#fff
    style RG fill:#1a1a2e,stroke:#e94560,color:#fff
    style CS fill:#1a1a2e,stroke:#e94560,color:#fff

pentest-ai vs. Manual Research

Task Without pentest-ai With pentest-ai
Plan an engagement Hours reviewing PTES/NIST docs, building spreadsheets manually Structured plan with MITRE mappings in minutes
Gather OSINT Manually run dozens of tools, cross-reference results by hand Automated methodology with passive/active classification
Analyze Nmap output Manually grep through results, cross-reference CVEs one by one Prioritized attack vectors with specific follow-up commands
Research an AD attack Read 10+ blog posts, piece together methodology from multiple sources Complete methodology with exact commands, OPSEC notes, and detection perspective
Model threats Weeks of STRIDE/DREAD analysis with spreadsheets Structured threat model with attack trees and risk matrices
Write detection rules Translate ATT&CK techniques into Sigma/SPL manually, test for false positives Deployment-ready rules in multiple formats with tuning guidance
Analyze malware Set up isolated lab, manually triage with multiple tools Guided static/dynamic analysis workflow with IOC extraction
STIG compliance Search DISA PDFs, manually map controls, write justifications from scratch Full analysis with GPO paths, verification commands, and keep-open templates
Write the report Days formatting findings, writing executive summaries, calculating CVSS Professional report structure with consistent formatting in minutes

How pentest-ai Is Different

There are other AI security tools out there (HexStrike AI, CAI, and various commercial platforms). Here's where pentest-ai sits:

Methodology first, execution when you want it. Most AI pentesting frameworks wrap 150+ tools behind an AI execution layer and call it a day. pentest-ai starts with methodology: what to do, why it works, how defenders catch it. Select agents (Tier 2) can also execute recon and enumeration commands directly, with your approval on every command. You stay in control and you learn the techniques.

Zero infrastructure. No Python environments, no Docker containers, no API keys beyond your Claude subscription. Copy some markdown files and you're working. Other frameworks require dedicated infrastructure, dependency management, and setup time before you can start.

Built for Claude Code natively. These agents use Claude Code's built-in subagent routing and permission system. There's no middleware, no custom framework, no orchestration layer to maintain. When Claude Code improves, your agents improve with it.

Dual perspective by default. Every offensive technique comes with the defensive side: how it gets detected, what logs it generates, what Sigma rules catch it. This isn't a separate mode or add-on. It's built into every agent response.

Accessible to all skill levels. You don't need to be a senior operator. Ask basic questions and get clear explanations. Ask advanced questions and get exact command syntax with OPSEC considerations. The agents meet you where you are.

pentest-ai Tool-Heavy Frameworks
Setup Copy markdown files, done Python env, Docker, API keys, dependencies
Approach Methodology + optional execution with approval AI executes tools directly, often without context
Learning You learn the techniques as you go Tool output without context
Safety model User approves every command via Claude Code Varies, often autonomous
Dependencies Claude Code only Custom frameworks, orchestration layers, tool installs
Defensive view Built into every response Separate module or not included
Maintenance Update agent files Track framework updates, tool compatibility, API changes

Use Cases

New to Security / SCA Representatives

If you're starting a new role in security assessment or compliance (SCA, IA analyst, security auditor), pentest-ai gives you a head start. Use the STIG analyst to learn how compliance controls work and what breaks when you apply them. Use the engagement planner to understand how professional assessments are structured. Use the threat modeler to learn how to think about risk systematically. These agents explain concepts at whatever depth you need, so you can grow into the role faster than studying documentation alone.

Internal Network Penetration Test

Start with the OSINT collector for pre-engagement reconnaissance. Use the engagement planner to build a phased plan with ATT&CK mappings. Run your scans and feed output to the recon advisor for prioritized attack vectors. Use the exploit guide for AD attack methodology and privilege escalation advisor for local privesc. Generate detection rules with the detection engineer so the client can monitor for the techniques you used. Compile everything with the report generator.

Cloud Security Assessment

Use the cloud security agent to enumerate IAM policies and find privilege escalation paths across AWS, Azure, or GCP. Combine with the API security agent for testing cloud-hosted APIs and serverless functions. Feed findings to the detection engineer for CloudTrail/Activity Log detection rules. Document everything with the report generator including cloud-specific remediation guidance.

Red Team Engagement

Start with OSINT and threat modeling to identify the most realistic attack paths. Use the social engineer to plan phishing campaigns. Deploy the wireless pentester for physical location assessments. Chain through exploit guide, privilege escalation, and cloud security as you move laterally. Use the forensics analyst to understand what artifacts you're leaving behind. The detection engineer builds rules the blue team can use afterward.

Mobile Application Assessment

Use the mobile pentester for Android/iOS app analysis with Frida and Objection. Combine with the API security agent for testing backend APIs. Feed findings to the report generator for OWASP MASVS-aligned reporting.

Incident Response

Deploy the forensics analyst for evidence acquisition and timeline construction. Use the malware analyst for suspicious binary triage. The detection engineer builds rules to catch the identified TTPs going forward. Document the incident with the report generator.

CTF Competition

Load the CTF solver for methodical challenge guidance across all categories. Use the recon advisor for network challenge enumeration, the exploit guide for complex exploitation chains, and the privilege escalation advisor when you have a low-privilege shell and need to escalate.

Compliance Audit

Use the STIG analyst to assess systems against DISA STIG baselines, generate GPO remediation paths, and write keep-open justifications. Feed identified gaps to the detection engineer to build monitoring rules for unmitigated findings. Generate the compliance report with the report generator.

Purple Team Exercise

Run offensive techniques through the exploit guide (which provides the defensive perspective for every technique), then use the detection engineer to build detection rules for each technique used. Validate detection coverage against the MITRE ATT&CK matrix. This workflow validates both the red team's methodology and the blue team's detection capabilities.


Quick Start

# Clone the repository
git clone https://github.com/0xSteph/pentest-ai.git

# Install globally (available in all projects)
cp pentest-ai/agents/*.md ~/.claude/agents/

# Or install for a specific project
mkdir -p .claude/agents/
cp pentest-ai/agents/*.md .claude/agents/

Then open Claude Code and try:

"I need to plan an internal penetration test for a mid-size company
with Active Directory, 3 VLANs, and about 500 endpoints.
The engagement window is 2 weeks."

Claude automatically routes to the engagement planner agent and produces a full phased plan.

New to Claude? See the step-by-step setup guide in INSTALL.md. It walks you through creating an account, installing the CLI, and getting your first agent response.

See INSTALL.md for all installation methods and troubleshooting.


Running Tools in a Container

Community suggestion: run your actual security tools inside a Docker container with Kali Linux instead of installing them on your host. This keeps your workstation clean and avoids endpoint protection flagging your toolset.

# Pull the Kali Linux Docker image
docker pull kalilinux/kali-rolling

# Start an interactive Kali container with network access
docker run -it --name pentest-lab kalilinux/kali-rolling /bin/bash

# Inside the container, install the tools you need
apt update && apt install -y nmap nikto sqlmap metasploit-framework bloodhound

# To reconnect later
docker start -ai pentest-lab

The workflow: Use pentest-ai agents in Claude Code on your host to get methodology guidance, analyze output, and plan your approach. Run the actual tools inside the Kali container. Copy tool output back into Claude Code for analysis.

Host (Claude Code + pentest-ai agents)
  ├── Get methodology from agents
  ├── Paste tool output for analysis
  └── Generate reports and detection rules

Docker Container (Kali + security tools)
  ├── Run Nmap, Nessus, BloodHound, etc.
  ├── Execute authorized testing
  └── Capture output for agent analysis

This separation means your host stays clean, your EDR doesn't flag your dev machine, and your tools live in a disposable environment you can rebuild anytime.


How Agent Routing Works

Claude Code reads the description field in each agent's YAML frontmatter to decide when to delegate. You don't need to specify which agent to use. Just describe your task naturally.

---
name: recon-advisor
description: Delegates to this agent when the user pastes scan output
             (Nmap, Nessus, Nikto, masscan, etc.)... Can execute
             reconnaissance commands directly with user approval.
tools: [Bash, Read, Write, Edit, Grep, Glob]
model: sonnet
---

Claude matches your intent to the agent description and routes automatically. You can also invoke agents explicitly if you prefer direct control.


Tier 1 vs. Tier 2 Agents

Agents operate in one of two tiers:

Tier 1: Advisory (all agents)

Every agent works in advisory mode. You paste tool output, ask methodology questions, or request documentation. The agent analyzes, advises, and generates artifacts. You run the tools yourself.

Tier 2: Execution (select agents)

Some agents can also compose and execute commands directly. When you ask the recon advisor to "scan 10.10.1.0/24," it builds the nmap command, explains what it does, tags the noise level, and runs it after you approve. Then it parses the output and recommends next steps.

How it works:

  1. You declare your authorized scope (IP ranges, domains, URLs)
  2. The agent validates every target against your scope before composing a command
  3. Claude Code shows you the full command and asks for approval
  4. The agent executes, saves evidence, analyzes results, and suggests the next step

Safety model: The agent enforces scope at the prompt level (refuses out-of-scope targets). Claude Code enforces approval at the system level (you see and approve every command). Both layers must pass before anything runs.

Tier 2 agents:

Agent What It Executes Risk Level
Recon Advisor nmap, dig, whois, curl, netcat, traceroute, whatweb, nikto Low (read-only scanning)

More agents will be promoted to Tier 2 as the pattern is validated. See docs/TIER2-EXECUTION.md for the full roadmap and conversion guide.

Tier 1 only (will not get execution):

Agent Why
Exploit Guide Exploitation tools are high-risk. Keep advisory.
Social Engineer Phishing execution should not be automated by AI.
Wireless Pentester Requires hardware interaction (WiFi adapters).
Engagement Planner Produces plans, not commands.
Threat Modeler Produces analysis, not commands.
Report Generator Produces documents, not commands.

Examples

See real agent output in the examples/ directory:

Example Agent What It Shows
Engagement Plan engagement-planner Full phased plan for an internal network pentest with MITRE ATT&CK mappings
Nmap Analysis recon-advisor Scan analysis with prioritized attack vectors and follow-up commands
Detection Rule detection-engineer Kerberoasting detection in Sigma, Splunk SPL, and Elastic KQL
STIG Finding stig-analyst V-220768 analysis with GPO path, verification, and keep-open template
Report Excerpt report-generator SQL injection finding formatted for a professional pentest report

Prerequisites

  • Claude Code installed and configured
  • Claude Pro or Max subscription
  • For authorized security testing: signed rules of engagement and defined scope
  • Recommended certifications: OSCP, GPEN, PenTest+, CEH, CPTS (or equivalent experience)

FAQ

Will Anthropic ban my account for using this?

No. These agents provide methodology guidance, analysis, and documentation. Tier 2 agents can execute reconnaissance commands, but every command goes through Claude Code's built-in permission prompt, which is Anthropic's own safety mechanism. The agents don't bypass it. Anthropic's guidelines allow security research, authorized penetration testing, and defensive security work. If you're using these agents for authorized engagements with proper written authorization, you're fine.

Claude's safety guardrails still apply. If you ask it to write actual malware or attack unauthorized systems, it will refuse, with or without these agents installed. Tier 2 execution is limited to reconnaissance and enumeration tools, not exploitation.

Does my data go to a third party?

When you use Claude Code, your prompts go to Anthropic for processing. pentest-ai agents don't add any extra data transmission. The data flow is identical to using Claude Code without agents installed.

For sensitive engagements:

  • Use Anthropic's API with your own key (API inputs aren't used for training by default)
  • Redact client-specific data before pasting tool output
  • Check your ROE and client NDAs for restrictions on AI data processing
  • See DATA-PRIVACY.md for detailed guidance and a client communication template

How is this different from HexStrike AI or CAI?

See How pentest-ai Is Different above. The short version: those tools wrap 150+ security tools behind an AI execution layer with autonomous operation. pentest-ai leads with methodology and adds optional execution with per-command user approval. You understand what's happening at every step. Different philosophy, different use case. They can complement each other.

I'm new to security. Is this useful for me?

Yes. The agents explain concepts at whatever level you need. Ask "what is Kerberoasting?" and you'll get a clear explanation. Ask "walk me through AS-REP Roasting against this specific AD configuration" and you'll get exact commands with OPSEC notes. If you're starting a role in security assessment or compliance, see the New to Security use case above.

Can I use a local model instead of Claude?

The agents are markdown files with system prompts. They're designed for Claude Code's subagent routing, but the methodology content works with any LLM that supports system prompts. You'd lose the automatic routing feature, but you could paste the agent content as context into any model. For fully local operation, see the local model options in DATA-PRIVACY.md.


Documentation

Document Description
INSTALL.md Step-by-step installation guide with 3 methods and troubleshooting
Agent Guide How each agent works, when to use it, and example prompts
Tier 2 Execution How execution mode works, safety model, and agent conversion guide
Customization Modify agents, change models, add tools, create new agents
Contributing How to submit improvements and agent quality standards
Data Privacy LLM data handling, sensitive engagements, local model options
Disclaimer Legal and ethical use terms

Contributing

Contributions welcome. See docs/CONTRIBUTING.md for guidelines.

Agent submissions must include MITRE ATT&CK mappings and consider both offensive and defensive perspectives.


Legal

This toolkit is for authorized security testing only. Users must have proper written authorization before using these agents in any engagement. See DISCLAIMER.md for full terms.

Tier 1 agents provide methodology guidance and analysis. Tier 2 agents can execute reconnaissance commands with your approval. No agent executes attacks, generates functional exploit code, or bypasses Claude Code's permission system.


License

MIT License


Built by 0xSteph

If this project helps your security work, consider giving it a star.

Yorumlar (0)

Sonuc bulunamadi