AI-Pentest-Playbook

mcp
Security Audit
Fail
Health Warn
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 17 GitHub stars
Code Fail
  • network request — Outbound network request in payloads/json/agent_tool_abuse.json
  • eval() — Dynamic code execution via eval() in payloads/json/code_interpreter_rce.json
  • eval() — Dynamic code execution via eval() in payloads/json/encoding_bypass.json
  • rm -rf — Recursive force deletion command in payloads/json/insecure_output_handling.json
  • network request — Outbound network request in payloads/json/insecure_output_handling.json
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

🛡 The reference playbook for pentesting AI chatbots & LLM-powered apps in one place. Ready-to-use payloads covering the full OWASP LLM Top 10 plus frontier vectors (MCP · RAG · A2A · computer-use · voice)

README.md

🧨 AI Pentest Playbook

The field manual for pentesting AI chatbots & LLM-powered apps

OWASP LLM Top 10
Star on GitHub

🚀 Jump straight to the Payload Library →


🎯 Working on an AI/LLM Security Engagement?

Use this handbook as your practical field guide throughout the entire assessment lifecycle—from initial reconnaissance against chatbots and AI-powered applications to advanced testing of prompt injection, jailbreak techniques, tool abuse, agent manipulation, and data exfiltration scenarios involving MCP, RAG, and computer-use systems.

Each chapter combines battle-tested attack payloads with clear guidance on expected success indicators, severity considerations, detection opportunities, and remediation recommendations. This enables both offensive and defensive teams to understand not only how an attack works, but also how to identify, mitigate, and prevent it.

The handbook covers the complete OWASP Top 10 for Large Language Model Applications, while also exploring emerging attack surfaces that extend beyond current industry frameworks.

Content is curated from a combination of public research, vendor disclosures, CVEs, academic papers, real-world incidents, and active red-team engagements, providing a comprehensive reference for modern AI security testing.

⚠️ For authorized testing only.


👉 Start Here — Open the Payload Library Master Index → PAYLOADS.md

Every payload on one page, grouped by attack class — copy-paste ready, full sets one click away. No digging through folders; it's all reachable from the master index.

Attack class Attack class
Prompt Injection Insecure Output Handling
Jailbreaks Code-Interpreter RCE
System-Prompt Extraction Training-Data & Memory Extraction
Encoding / Obfuscation Bypass Model Denial-of-Service
Indirect & Multi-Modal Injection Agent / Tool Abuse

For technique-level coverage with detection and mitigation, browse the chapters below.


Chapters

Foundations

# Chapter
01 Recon & Fingerprinting — model ID, system-prompt detection, tool & architecture inference

OWASP LLM Top 10

# Chapter OWASP
02 Prompt Injection LLM01
03 Insecure Output Handling LLM02
04 Training Data Poisoning LLM03
05 Model Denial of Service LLM04
06 Supply Chain Vulnerabilities LLM05
07 Sensitive Information Disclosure LLM06
08 Insecure Plugin / Tool Design LLM07
09 Excessive Agency LLM08
10 Overreliance / Hallucination Exploitation LLM09
11 Model Theft / Extraction LLM10

Beyond OWASP

# Chapter
12 Jailbreaking Techniques — DAN, STAN, AIM, crescendo, many-shot, token smuggling
13 MCP / Agentic Attack Surface — tool poisoning, MCPoison, rug-pull MCP
14 RAG & Vector Store Attacks — retrieval poisoning, embedding inversion
15 Indirect Prompt Injection — web, document, email injection
16 Tools & Automation — Garak, PyRIT, Burp, MCP inspector

Frontier Vectors

# Chapter
17 A2A Protocol Attacks — agent-to-agent spoofing, capability inflation
18 Computer-Use Agent Attacks — browser-use, Operator, screen-injection
19 Sycophancy Exploitation — confidence flips, reward-model gaming
20 Memory Poisoning — persistent context attacks, cross-session bleed
21 Function-Calling Abuse — schema injection, parallel-tool races
22 Voice / Audio Assistant Attacks — cloning, replay, ultrasonic injection

Payload Library — by goal

Full one-page index with inline top picks: PAYLOADS.md

Goal Chapter Payload set
Extract system instructions 07 system_prompt_extraction.md
Inject / override instructions 02 prompt_injection.md
Bypass safety policy 12 jailbreaks.md
Encoding / obfuscation bypass 12 encoding_bypass.md
Attack via planted content (web / docs / RAG / media) 15 indirect_injection.md
Exploit downstream renderer (XSS / SSTI / SQLi / RCE) 03 insecure_output_handling.md
Escalate a code / Python tool to RCE 08 code_interpreter_rce.md
Steal training data / memory / PII 07 data_extraction.md
Exhaust resources / drain budget 05 model_dos.md
Abuse agent actions / SSRF / tools 09 agent_tool_abuse.md

Repository Layout

AI-Pentest-Playbook/
├── PAYLOADS.md     ⭐ one-page payload index — start here
├── payloads/       the payloads, grouped by attack class
├── docs/           technique chapters (detection + mitigation)
├── scripts/        runner · burp-export · master-csv
├── README.md
└── CONTRIBUTING.md

💜 Found This Useful? Give Back.

This handbook gets sharper every time a real-world finding is distilled back into reusable knowledge.

🛠 Got a new payload, defence pattern, or attack surface that should be a chapter?

Open a PR
Star on GitHub

Most payload PRs merge within a day — new chapters and defensive notes are very welcome. A ⭐ helps other AppSec / red-team folks find the handbook.

🐛 Spotted something wrong, missing, or outdated?Open an issue. Accuracy beats completeness.


Citation

@misc{aihackershandbook,
  author       = {4vanish and contributors},
  title        = {AI Hacker's Handbook: A playbook for pentesting AI chatbots and LLM-powered applications},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/4vanish/AI-Pentest-Playbook}}
}

Author

Built and maintained by Avanish Pathak.

LinkedIn

Reviews (0)

No results found