agent-smith
Health Gecti
- License — License: AGPL-3.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 33 GitHub stars
Code Gecti
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
This tool is an AI-driven offensive security agent designed for automated penetration testing. It uses a large language model to chain methodologies, generate contextual payloads, and discover vulnerabilities against designated targets.
Security Assessment
Overall risk is rated as High. This is an intentional risk by design, as the tool is built to execute shell commands and perform active network requests against targets to find security flaws. It runs scanners inside ephemeral Docker containers to sandbox these powerful capabilities and enforces server-side limits. The light code audit found no hardcoded secrets or unexpected dangerous patterns. However, because the LLM is dynamically generating payloads and deciding its next actions, human oversight is mandatory to ensure it does not interact with unauthorized systems. The repository strictly warns against illegal or unauthorized use.
Quality Assessment
The project is actively maintained, with its most recent push occurring today. It uses the standard AGPL-3.0 license and demonstrates decent community traction for a specialized utility with 33 GitHub stars. The codebase passed a basic light audit, scanning 12 files without finding any malicious patterns, and integrates SonarCloud for continuous code quality checks.
Verdict
Use with caution: the code appears professionally structured and safe to operate, but you must strictly employ it within isolated, legally authorized testing environments.
"Never send a human to do a machine's job" - Open Source AI hacking agent
agent-smith
An AI-driven offensive-security agent that thinks for itself.
You bring the LLM. agent-smith brings the tools, the skills, and the methodology and the LLM does the rest.
⚠️ Authorized testing only. Use against systems you own or have explicit written permission to test. Unauthorized access is illegal.
Why agent-smith
- 🧠 The LLM is the brain, not a payload library.
Skills teach methodology; the LLM invents the actual attacks. No two scans look 100% alike.
- 🔗 Skills chain themselves.
/pentester discovers an injection point and pivots into /web-exploit; /codebase finds an LLM call site and pivots into /ai-redteam. The agent decides what to run next based on what it just found.
- 🛠 Bring your own LLM.
Works with Claude Code, OpenCode (any provider — OpenAI, Gemini, Ollama, OpenRouter, local models), or any MCP-capable client.
- 📦 End-to-end deliverables.
Findings, PoCs (Burp-ready .http files), threat models, code patches, GitHub issues, and CVE submission packages all generated for you.
- 🐳 Sandboxed by default.
Every scanner runs inside an ephemeral Docker container. Hard cost / time / call-count limits enforced server-side.
- 📊 Live dashboard.
Watch findings, topology, coverage, and the threat model populate in real time at localhost:5000.
The new way: skills as pattern teachings
Most pentest automation ships a giant payload library and runs it linearly. agent-smith does the opposite.
Skills are not scripts. Skills are prompts that teach the LLM a way of thinking. They describe the vulnerability class, the surface area, the verification logic, and the chaining rules but they leave the actual attacks to the model. The LLM reads the skill, understands the pattern, and then finds its own paths through your target.
This means:
| Traditional Security tools | agent-smith |
|---|---|
| Fixed payload list | Us: LLM-generated payloads, contextual to each target |
| One tool per phase | Skills compose — /codebase enriches /pentester, which enriches /post-exploit |
| Stops at first success | Keeps probing until the cost / time / coverage budget is hit |
| Generates a PDF | Generates findings, PoCs, patches, threat models, coverage matrix, CVE packages and more... |
| Same scan every time | Two runs against the same target produce different attack paths |
The skills are inspiration. The LLM is the operator.
See it in action
|
Recon → fingerprint → exploit → loot → report. The agent decides every step. |
Source → routes → sinks → ASVS chapters → enriched context for every downstream skill. |
|
Prompt injection, jailbreaks, model extraction, MCP runtime attacks, and post-access infra checks. |
For every confirmed finding the agent writes a code or config patch and verifies it doesn't break the build. |
Use cases
Drop in any of these the moment you start your client/agent. No setup beyond ./installers/install.sh
Below are the skills you can use in your OpenCode or Claude Code, these are just a couple examples and use cases as we have more then 25+ Cyber Security skills.
1. Run a full pentest, hands-off
/pentester scan https://staging.example.com depth=thorough
The agent runs OSINT → recon → web exploit → post-exploit → reporting, deciding each pivot from the previous result. End state: findings.json, PoCs in pocs/, a topology diagram, a coverage matrix, and a patch ready code fix.
2. Pre-prod secure code review
/codebase path=./src
White-box ASVS 5.0 review across 16 chapters and 427 requirements. Maps every route, every sink, every dangerous pattern.
3. Triage a CVE in your dependency tree
/analyze-cve lodash 4.17.20 CVE-2021-23337
The agent reads your code, traces the vulnerable function from user input to sink, decides whether you're actually exploitable, and writes a Burp-ready PoC if you are.
4. AI / LLM red-team
/ai-redteam https://your-app.com/api/chat provider=openai depth=thorough
Covers OWASP LLM Top 10 (2025), the OWASP AI Testing Guide (AITG v1, Nov 2025), and OWASP MCP Top 10 runtime attacks. Generates payloads on the fly using FuzzyAI, Garak, PyRIT, and promptfoo.
5. Build a CVE submission package
/request-cves
After a pentest the agent generates the MITRE CVE form, a GitHub Security Advisory draft, a full disclosure report, and a vendor notification email — for every qualifying finding.
6. Threat-model an architecture
/threat-modeling
PASTA + STRIDE + 4-question framework. Outputs component map, data flow diagram, attack tree, prioritized risk register, and a mitigation plan.
💡 Pick a skill or let
/pentesterorchestrate. Single-purpose skills give you laser focus;/pentesterchains everything based on what it finds.
Quick start
Requirements
| Dependency | Notes |
|---|---|
| Docker Desktop | Must be running. All scanners are sandboxed. |
| Poetry | curl -sSL https://install.python-poetry.org | python3 - |
| One LLM client (pick one) | See below ↓ |
| Node.js v18+ | Optional — enables server-side Mermaid pre-rendering. |
Pick your LLM client
agent-smith ships an MCP server. Anything that speaks MCP can drive it.
| Claude Code | OpenCode (BYO LLM) | Custom MCP client |
|---|---|---|
Anthropic's official CLI. Best UX, native skill support.
Requires Claude Code + an Anthropic API key.
|
Open-source coding agent that supports any provider — OpenAI, Anthropic, Google, OpenRouter, Ollama, llama.cpp, vLLM, your own endpoint.
Requires OpenCode. Configure your model in ~/.config/opencode/opencode.json.
|
Any MCP-capable client (Cursor, Continue, Zed, custom Agent SDK app, etc.).
Wire the stdio MCP server into your client. Skills are plain markdown in skills/ — load them however your client expects prompts.
|
🧠 The LLM is your choice. agent-smith doesn't care if it's Claude Opus 4.6, GPT-5, Gemini 2.5, Llama-4, or a local Qwen3 — anything strong enough to follow tool-use instructions will work. Bigger / smarter models find more interesting attack paths.
⚠️ After install, fully restart your client. The MCP server connects at startup.
Optional images
# Kali container — required for /credential-audit, /web-exploit deep tools, etc.
docker build -t pentest-agent/kali-mcp ./tools/kali/ # ~10 min, ~3 GB
# Metasploit container — required for /metasploit
docker build -t pentest-agent/metasploit ./tools/metasploit/ # ~5 min
Lightweight tools (nmap, nuclei, httpx, ffuf, semgrep, trufflehog, …) are auto-pulled on first use.
How it works
You (/pentester scan target.com)
└── Your LLM (Claude / GPT / Gemini / local …)
└── MCP server (python -m mcp_server)
├── Lightweight scanners — docker run --rm (nmap, nuclei, httpx, …)
├── Kali container — persistent kali-mcp (nikto, sqlmap, ffuf, …)
├── Metasploit container — exploit validation
└── FastAPI dashboard — live findings at localhost:5000
The LLM decides what to run. Each tool's output is aggregated and returned to the model, which interprets the result and chooses the next action — pivoting deeper, skipping dead ends, or finalizing findings. Hard cost / time / call-count limits are enforced server-side. When any limit fires, the tool returns a stop signal and the agent writes the final report.
Architecture
flowchart TD
User["You<br/>/pentester · /codebase · /ai-redteam · /threat-model"]
Agent["Your LLM client<br/>Claude Code · OpenCode · MCP-capable IDE"]
MCP["mcp_server/<br/>5 consolidated tools<br/>scan · kali · http · report · session"]
Docker["Docker containers<br/>ephemeral --rm · 2 GB RAM · 1.5 CPU<br/>nmap · nuclei · httpx · ffuf · semgrep · trufflehog"]
Kali["Kali Linux container<br/>persistent · port 5001<br/>nikto · sqlmap · hydra · testssl · pyrit"]
Msf["Metasploit container<br/>persistent · port 5002"]
Core["core/<br/>session · logger · findings · cost · coverage · api_server"]
Dashboard["FastAPI dashboard<br/>localhost:5000<br/>Findings · Topology · Components · Coverage · Threat Model · Logs"]
Target["Target<br/>URL · IP range · codebase"]
User -->|slash command| Agent
Agent -->|MCP stdio| MCP
MCP --> Docker
MCP --> Kali
MCP --> Msf
MCP --> Core
Core --> Dashboard
Docker -->|raw output| Target
Kali -->|raw output| Target
Msf -->|raw output| Target
Docker -->|aggregated| Agent
Kali -->|aggregated| Agent
Msf -->|aggregated| Agent
Skills
Skills are markdown files that teach the LLM a methodology. They live in the skills/ submodule and are loaded by your client at startup.
How they chain
flowchart LR
%% Codebase enriches multiple skills (white-box context)
codebase[/codebase/] -.enriches.-> pentester
codebase -.enriches.-> web[/web-exploit/]
codebase -.enriches.-> cve[/analyze-cve/]
codebase -.detects LLM use.-> ai
%% Pentester is the hub — branches to every scan/discovery skill
pentester[/pentester/] --> web
pentester --> api[/api-security/]
pentester --> net[/network-assess/]
pentester --> creds[/credential-audit/]
pentester --> ssl[/ssl-tls-audit/]
pentester --> email[/email-security/]
pentester --> cloud[/cloud-security/]
pentester --> cve
pentester --> msf[/metasploit/]
pentester --> ai-redteam
pentester --> ad[/ad-assessment/]
pentester --> rsh[/reverse-shell/]
%% Cloud discovers AI endpoints and K8s clusters
cloud --> ai-redteam
cloud --> k8s[/container-k8s-security/]
cloud --> cve
%% Web exploitation feeds CVE analysis, creds, post-exploit, and AI red-team
web --> cve
web --> creds
web --> post[/post-exploit/]
web --> ai-redteam
%% API security: chains into web-exploit for injection depth, ai-redteam for LLM endpoints
api --> web
api --> ai-redteam
api --> cve
api --> creds
api --> post
codebase -.enriches.-> api
%% Metasploit & reverse-shell drop into post-exploit
msf --> post
rsh --> post
%% Network assessment branches
net --> creds
net --> ssl
net --> k8s
net --> post
net --> lateral[/lateral-movement/]
%% Credential audit and SSL feed into post-exploit
creds --> post
creds --> lateral
ssl --> creds
email --> creds
%% Container/K8s security
k8s --> post
%% Post-exploit → pivoting & lateral movement
post --> lateral
post --> pivot[/pivot-tunnel/]
post --> ai_post[/ai-redteam<br/>Phase 3c/]
pivot --> lateral
ad --> lateral
%% AI red-team → post-access infra checks
ai --> post
ai --> k8s
ai --> cve
%% Aikido triage feeds CVE dataflow analysis
aikido[/aikido-triage/] --> cve
%% Reporting tail
pentester --> tm[/threat-modeling/]
tm --> remediate[/remediate/]
pentester --> remediate
codebase --> remediate
pentester --> ghexp[/gh-export/]
pentester --> req[/request-cves/]
Catalog
Penetration testing| Skill | What it does |
|---|---|
/pentester |
Full autonomous engagement — chains everything else |
/web-exploit |
SQLi, XSS, SSRF, SSTI, deserialization, JWT, smuggling, race conditions, etc. |
/api-security |
OWASP API Top 10 (2023) — BOLA, BFLA, mass assignment, JWT/OAuth abuse, SSRF, business-flow abuse, inventory drift. REST/GraphQL/gRPC/SOAP/MCP |
/network-assess |
VLAN hopping, LLMNR/NBT-NS, SNMP, segmentation |
/post-exploit |
Linux/Windows privesc, persistence, credential harvesting |
/lateral-movement |
PTH, PTT, Kerberoasting, NTLM relay, delegation abuse |
/metasploit |
Exploit validation in an isolated Docker container |
/reverse-shell |
Generates and manages reverse shells across all platforms |
/pivot-tunnel |
Chisel + SOCKS5 tunneling after RCE |
| Skill | What it does |
|---|---|
/cloud-security |
AWS / Azure / GCP IAM, storage, serverless, logging gaps |
/container-k8s-security |
Container escape, K8s RBAC, etcd, service account abuse |
/ad-assessment |
ADCS (ESC1–ESC8), BloodHound, GPO, LAPS, forest trusts |
/email-security |
SPF / DKIM / DMARC, open relay, MTA-STS, SMTP security |
/ssl-tls-audit |
TLS protocol/cipher audit, cert chain, POODLE/BEAST/Heartbleed/etc. |
/credential-audit |
Brute force, password spraying, default creds, lockout, MFA bypass |
| Skill | What it does |
|---|---|
/osint |
Subdomain takeover, cert transparency, Shodan, leaked creds |
/threat-modeling |
PASTA + STRIDE + 4-question, attack tree, risk register |
/codebase |
OWASP ASVS 5.0 white-box review (16 chapters, 427 requirements) |
/analyze-cve |
CVE code-path tracing + Burp PoC |
/aikido-triage |
Triage Aikido SAST/SCA/secret-scan CSV against your code |
| Skill | What it does |
|---|---|
/ai-redteam |
OWASP LLM Top 10 + AITG v1 + MCP Top 10 runtime attacks |
/colang-gen |
Generate NeMo Guardrails Colang configs from plain language |
| Skill | What it does |
|---|---|
/remediate |
Writes code patches and config fixes for every finding |
/gh-export |
Formats confirmed findings as copy-pasteable GitHub issues |
/request-cves |
Generates CVE submission packages — MITRE form, GHSA draft, disclosure report, vendor email |
What you get out
Every scan produces a structured set of artifacts you can hand to a developer, a manager, or a vendor on day one:
| Artifact | Where | What it's for |
|---|---|---|
findings.json |
repo root | Machine-readable findings + diagrams |
pocs/*.http |
repo root | Raw HTTP PoCs (open in Burp Repeater) |
| Live dashboard | localhost:5000 |
Findings, Topology, Components, Coverage, Threat Model, Logs |
| Coverage matrix | dashboard tab | Endpoint × technique tracking — proves what you tested |
| Code patches | inline edits | One per finding, generated by /remediate |
| Threat model | threat-model/*.md |
PASTA + STRIDE write-up + diagrams |
| CVE packages | via /request-cves |
MITRE form, GHSA draft, vendor email, full disclosure |
| Session log | logs/pentest.log |
Full audit trail of what the agent decided and why |
Project layout
Click to expandmcp_server/ MCP tool layer — 5 consolidated tools (LLM-callable)
__main__.py entry point → python -m mcp_server
_app.py FastMCP singleton + shared helpers (_run, _clip)
scan_tools.py scan() — nmap · naabu · httpx · nuclei · ffuf · spider
subfinder · semgrep · trufflehog · fuzzyai · pyrit
garak · promptfoo · metasploit
kali_tools.py kali() — freeform commands in the Kali container
http_tools.py http() — raw HTTP requests + PoC saving
report_tools.py report() — findings · diagrams · notes · dashboard · coverage
session_tools.py session() — scan lifecycle · Kali infra · codebase target
core/ Server infrastructure
session.py Scan scope, depth presets, hard limit enforcement
logger.py Structured session log → logs/pentest.log
findings.py findings.json read/write
cost.py Cost tracking per tool invocation
coverage.py Endpoint × technique coverage matrix
api_server.py FastAPI web server (dashboard + REST API)
tools/ Docker tool definitions + runners
base.py · docker_runner.py · kali_runner.py · metasploit_runner.py
nmap / naabu / httpx / nuclei / ffuf / subfinder / semgrep / trufflehog / fuzzyai
tools/kali/ Kali image (Dockerfile + pyrit_runner.py)
tools/metasploit/ Metasploit image (Dockerfile + msfconsole HTTP shim)
skills/ Slash command definitions (git submodule)
24+ skills covering recon → exploit → report → remediate
templates/dashboard.html 6-tab dashboard
threat-model/ Threat model reports (auto-displayed in the dashboard)
tests/ pytest suite
logs/ Session logs
pocs/ Saved proof-of-concept HTTP requests
docs/gifs/ Demo gifs (drop your recordings here)
installers/
install.sh Claude Code installer
install_opencode.sh OpenCode installer (BYO LLM)
uninstall.sh Remove MCP config and installed skills
opencode-pentest-recovery.mjs Compaction recovery plugin for OpenCode
Documentation
| Doc | Contents |
|---|---|
| docs/tools.md | All MCP tools — parameters, purpose, examples |
| docs/kali-toolchain.md | Full kali command reference |
| docs/skills.md | Slash commands, chaining guide, examples |
| docs/dashboard-api.md | FastAPI endpoints, response shapes |
| docs/extending.md | How to add new tools and skills |
| docs/testing.md | Running the test suite, coverage, adding tests |
Adding a new skill? Skills live in a separate repo (github.com/0x0pointer/skills) pulled in as a git submodule. After adding a skill there, update the submodule pointer (
git add skills && git commit) and re-run the installer to deploy it.
License
GNU Affero General Public License v3.0 — see LICENSE.
Built for offensive-security professionals. Use it to make the internet safer.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi