Phantom
Health Pass
- License — License: Apache-2.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 106 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
This tool is an autonomous, AI-driven penetration testing agent that chains together over 50 professional security utilities to simulate cyberattacks, identify vulnerabilities, and generate verified reports.
Security Assessment
Overall risk: High. By design, this application is built to execute shell commands, make extensive network requests, and interact with local file systems to run exploits and write proofs-of-concept. While the automated code scan passed with no hidden backdoors, malicious patterns, or hardcoded secrets found, the fundamental nature of the software is offensive. It requires access to highly sensitive system resources to function. You must ensure it is strictly isolated and pointed only at targets you have explicit authorization to test.
Quality Assessment
The project is in active development and appears well-maintained, with repository updates as recent as today. It utilizes a standard Apache-2.0 license, ensuring clear terms of use. It has garnered moderate community trust, reflected by over 100 GitHub stars, which signals a genuine user base.
Verdict
Use with extreme caution: the tool itself is safe to run and professionally structured, but strictly isolate it in a sandbox environment and only deploy it against authorized targets due to its highly offensive capabilities.
Autonomous Offensive Security Intelligence AI-powered multi-agent penetration testing
☠ PHANTOM
Autonomous Adversary Simulation Platform
AI-native penetration testing — autonomous reconnaissance, exploitation, and verified results.
Quick Start · Architecture · Usage · Configuration · Contributing
Overview
Phantom is an autonomous AI penetration testing agent built on the ReAct (Reason–Act) loop. It connects a large language model to over 30 professional security tools, runs all offensive operations inside an isolated Docker sandbox, and produces verified vulnerability reports — entirely without human intervention.
Unlike CVE-signature scanners, Phantom reasons about your target: it reads HTTP responses, forms hypotheses, selects the right tool, chains multi-step exploits, then writes and executes a proof-of-concept script to confirm every finding before it appears in a report.
| Traditional Scanners | Phantom | |
|---|---|---|
| Approach | Signature matching against CVE databases | LLM reasoning + adaptive tool chaining |
| False Positives | 40–70% — requires manual triage | Every finding verified with a working PoC |
| Depth | Single-pass HTTP probe | Multi-phase: recon → exploit → verify |
| Adaptability | Fixed rules, static payloads | Adapts to target responses in real time |
| Novel Vulns | Known CVEs only | Logic flaws + novel attack paths |
| Reporting | Generic vulnerability lists | MITRE ATT&CK mapped, compliance-ready |
Core Capabilities
| 🧠 | Autonomous ReAct Loop — Plans, executes tools, reads results, re-plans. Handles dead ends and unexpected responses without human guidance. |
| 🔧 | 53 Security Tools — nmap · nuclei · sqlmap · ffuf · httpx · katana · subfinder · nikto · gobuster · arjun · semgrep · playwright — all orchestrated automatically. |
| 🐳 | Ephemeral Docker Sandbox — All offensive tooling runs in a network-restricted Kali Linux container. Zero host filesystem access. Container is destroyed after every scan. |
| ⚡ | Multi-Agent Parallelism — Spawns specialized sub-agents (SQLi, XSS, recon) that work concurrently and report findings to the coordinator. |
| 🛡️ | 7-Layer Defense Model — Scope guard → Tool firewall → Docker sandbox → Cost limiter → Time budget → HMAC audit trail → Output sanitizer. |
| ✅ | Verified Findings Only — No hallucinations. Every reported vulnerability includes raw HTTP evidence, reproduction steps, and a working exploit script. |
| 🗺️ | MITRE ATT&CK Enrichment — Automatic CWE, CAPEC, technique-level tagging, and CVSS 3.1 scoring per finding. |
| 📋 | Compliance Coverage — OWASP Top 10 (2021) · PCI DSS v4.0 · NIST 800-53 — mapped automatically per finding. |
| 💾 | Knowledge Persistence — Cross-scan memory stores hosts, past findings, and false-positive signatures. Each scan learns from the last. |
| 💰 | Full Cost Control — Per-request and per-scan budget caps. Every token and every dollar tracked in real time. |
Architecture
① System Architecture — Component Overview%%{init: {"theme": "dark"}}%%
flowchart TD
USER(["👤 User / CI-CD"])
subgraph IFACE["Interface Layer"]
CLI["CLI · TUI"]
PARSER["Output Parser"]
end
subgraph ORCH["Orchestration"]
PROFILE["Scan Profile"]
SCOPE["Scope Guard"]
COST["Cost Controller"]
AUDIT["HMAC Audit Log"]
end
subgraph AGENT["Agent Core — ReAct"]
LLM["LLM via LiteLLM"]
STATE["State Machine"]
MEM["Memory Engine"]
SKILLS["Skills Engine"]
end
subgraph SEC["Security Layer"]
FW["Tool Firewall"]
VERIFY["Verifier"]
SANIT["Sanitizer"]
end
subgraph SANDBOX["Docker Sandbox — Kali Linux"]
TSRV["Tool Server :48081"]
TOOLS["30+ Security Tools"]
BROWSER["Playwright · Chromium"]
PROXY["Caido Proxy :48080"]
end
subgraph OUTPUT["Output Pipeline"]
REPORTS["JSON · MD · HTML"]
GRAPH["Attack Graph"]
MITRE["MITRE ATT&CK Map"]
end
USER --> IFACE
IFACE --> ORCH
ORCH --> AGENT
AGENT <--> SEC
SEC --> SANDBOX
AGENT --> OUTPUT
style IFACE fill:#6c5ce7,stroke:#a29bfe,color:#ffffff
style ORCH fill:#00b894,stroke:#55efc4,color:#ffffff
style AGENT fill:#e17055,stroke:#fab1a0,color:#ffffff
style SEC fill:#d63031,stroke:#ff7675,color:#ffffff
style SANDBOX fill:#0984e3,stroke:#74b9ff,color:#ffffff
style OUTPUT fill:#f9ca24,stroke:#f0932b,color:#2d3436
② Scan Execution Flow — Phase by Phase
%%{init: {"theme": "dark"}}%%
sequenceDiagram
actor User
participant CLI as Phantom CLI
participant Orch as Orchestrator
participant Agent as Agent ReAct
participant FW as Tool Firewall
participant Box as Docker Sandbox
participant LLM as LLM Provider
participant T as Target App
User->>CLI: phantom scan -t https://app.com
CLI->>Orch: Validate scope · init cost controller
Orch->>Box: Spin up ephemeral Kali container
Orch->>Agent: Begin scan · profile + scope injected
rect rgb(48, 25, 80)
Note over Agent,LLM: Phase 1 — Reconnaissance
Agent->>LLM: Analyze target · plan recon
LLM-->>Agent: Run katana · httpx · nmap
Agent->>FW: Validate tool call
FW-->>Agent: Approved
Agent->>Box: Execute recon tools
Box->>T: HTTP probes · port scans · crawl
T-->>Box: Responses
Box-->>Agent: Endpoints · tech stack · open ports
end
rect rgb(80, 20, 20)
Note over Agent,LLM: Phase 2 — Exploitation
Agent->>LLM: Hypothesize attack vectors
LLM-->>Agent: SQLi on /api/login · XSS on /search
Agent->>Box: sqlmap · custom payload injection
Box->>T: Exploit attempts
T-->>Box: Vulnerability confirmed
Box-->>Agent: Raw HTTP evidence
end
rect rgb(15, 60, 30)
Note over Agent,LLM: Phase 3 — Verification
Agent->>Box: Re-exploit with clean PoC script
Box->>T: Reproduce exact attack
T-->>Box: Confirmed
Agent->>Agent: CVSS 3.1 · CWE tag · MITRE map
end
Agent->>CLI: Findings compiled
CLI->>User: Vulnerabilities + PoCs + Compliance
CLI->>Box: Destroy container
③ Agent ReAct Loop — Decision Cycle
%%{init: {"theme": "dark"}}%%
flowchart LR
INIT(["Scan Start"])
OBS["Observe\nCollect results"]
THINK["Reason\nAnalyze context"]
PLAN["Plan\nChoose tool"]
ACT["Act\nBuild arguments"]
FW{"Firewall?"}
EXEC["Execute\nDocker sandbox"]
DONE{"Stop\nCondition?"}
VERIFY["Verify\nRe-test findings"]
ENRICH["Enrich\nMITRE · CVSS"]
REPORT["Report\nJSON · HTML · MD"]
FINISH(["Scan Complete ☠"])
INIT --> OBS
OBS --> THINK
THINK --> PLAN
PLAN --> ACT
ACT --> FW
FW -- "✓ Pass" --> EXEC
FW -- "✗ Block" --> THINK
EXEC --> OBS
OBS --> DONE
DONE -- "Continue" --> THINK
DONE -- "Done" --> VERIFY
VERIFY --> ENRICH
ENRICH --> REPORT
REPORT --> FINISH
style INIT fill:#6c5ce7,stroke:#a29bfe,color:#fff
style FINISH fill:#6c5ce7,stroke:#a29bfe,color:#fff
style FW fill:#d63031,stroke:#ff7675,color:#fff
style DONE fill:#e17055,stroke:#fab1a0,color:#fff
style EXEC fill:#0984e3,stroke:#74b9ff,color:#fff
style REPORT fill:#00b894,stroke:#55efc4,color:#fff
④ Docker Sandbox — Isolation Architecture
%%{init: {"theme": "dark"}}%%
flowchart LR
HOST(["Phantom Agent\nHost Machine"])
subgraph CONTAINER["Kali Linux Container — Network Isolated"]
TSRV["Tool Server :48081"]
PROXY["Caido Proxy :48080"]
subgraph TOOLKIT["Security Toolkit"]
SCA["nmap · masscan"]
INJ["sqlmap · nuclei"]
FUZ["ffuf · gobuster · arjun"]
WEB["httpx · katana"]
ANA["nikto · semgrep"]
end
subgraph RUNTIME["Runtime Environment"]
PY["Python 3.12"]
BR["Playwright + Chromium"]
SH["Bash Shell"]
end
end
TARGET(["Target\nApplication"])
HOST -- "Authenticated API" --> TSRV
TSRV --> TOOLKIT
TSRV --> RUNTIME
PROXY -- "Intercept + Log" --> TARGET
TOOLKIT -- "Attack traffic" --> TARGET
RUNTIME -- "Browser sessions" --> TARGET
style CONTAINER fill:#0984e3,stroke:#74b9ff,color:#ffffff
style TOOLKIT fill:#d63031,stroke:#ff7675,color:#ffffff
style RUNTIME fill:#6c5ce7,stroke:#a29bfe,color:#ffffff
style HOST fill:#2d3436,stroke:#636e72,color:#dfe6e9
style TARGET fill:#2d3436,stroke:#636e72,color:#dfe6e9
⑤ 7-Layer Defense Model — Request Lifecycle
%%{init: {"theme": "dark"}}%%
flowchart TD
REQ(["Incoming Request"])
L1["① Scope Validator\nTarget allowlist · SSRF protection"]
L2["② Tool Firewall\nArg sanitization · Injection block"]
L3["③ Docker Sandbox\nEphemeral Kali · Restricted Linux caps"]
L4["④ Cost Controller\nPer-request ceiling · Budget cap"]
L5["⑤ Time Limiter\nPer-tool timeout · Global scan expiry"]
L6["⑥ HMAC Audit Trail\nTamper-evident append-only log"]
L7["⑦ Output Sanitizer\nPII redaction · Credential scrubbing"]
PASS(["✓ Authorized Output"])
BLOCK(["✗ Blocked & Logged"])
REQ --> L1
L1 -- "✓ In scope" --> L2
L1 -- "✗ Out of scope" --> BLOCK
L2 -- "✓ Safe" --> L3
L2 -- "✗ Injection" --> BLOCK
L3 --> L4
L4 -- "✓ Within budget" --> L5
L4 -- "✗ Over budget" --> BLOCK
L5 -- "✓ In time" --> L6
L5 -- "✗ Timeout" --> BLOCK
L6 --> L7
L7 --> PASS
style REQ fill:#6c5ce7,stroke:#a29bfe,color:#fff
style PASS fill:#00b894,stroke:#55efc4,color:#fff
style BLOCK fill:#d63031,stroke:#ff7675,color:#fff
style L1 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L2 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L3 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L4 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L5 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L6 fill:#2d3436,stroke:#636e72,color:#dfe6e9
style L7 fill:#2d3436,stroke:#636e72,color:#dfe6e9
Quick Start
Requirements: Docker · Python 3.12+ · An LLM API key
# Install
pip install phantom-agent
# or for fully isolated install:
pipx install phantom-agent
# Set your LLM
export PHANTOM_LLM="openai/gpt-4o" # any LiteLLM-supported model
export LLM_API_KEY="sk-..."
# Run your first scan
phantom -t https://your-app.com
First run pulls the sandbox image (~13 GB). This happens once. Subsequent scans start in under 10 seconds.
Via Docker
docker run --rm -it \
-e PHANTOM_LLM="openai/gpt-4o" \
-e LLM_API_KEY="your-key" \
-v /var/run/docker.sock:/var/run/docker.sock \
ghcr.io/usta0x001/phantom:latest \
-t https://your-app.com
Usage
# Quick scan (~15 min) — CI/CD friendly
phantom -t https://app.com -m quick
# Standard scan (~45 min) — recommended default
phantom -t https://app.com
# Deep scan (1–3 h) — exhaustive coverage
phantom -t https://app.com -m deep
# With custom focus instructions
phantom -t https://app.com \
--instruction "Focus on SQL injection and broken auth in /api/v2"
# Resume an interrupted scan
phantom -t https://app.com
# Non-interactive (CI/CD pipelines)
phantom -t https://app.com --non-interactive
# Set a cost ceiling
PHANTOM_MAX_COST=2.00 phantom -t https://app.com
Scan Profiles
| Profile | Max Iterations | Typical Duration | Best For |
|---|---|---|---|
quick |
300 | ~15–60 min | CI/CD gates, rapid triage |
standard |
120 | ~20–45 min | Regular security testing |
deep |
300 | 1–3 hours | Full audits, compliance (default) |
stealth |
60 | ~30–60 min | Covert assessments, WAF-aware targets |
api_only |
100 | ~20–45 min | REST/GraphQL API-focused scans |
Output
Every scan produces:
phantom_runs/<target>_<id>/
├── vulnerabilities/
│ ├── vuln-0001.md # Full finding with PoC exploit
│ └── vuln-0002.md
├── audit.jsonl # HMAC-signed immutable event log
├── scan_stats.json # Cost, tokens, timing metrics
├── enhanced_state.json # Full scan state snapshot
└── vulnerabilities.csv # Summary index for triage
Post-Scan Enrichment Pipeline
Every scan automatically runs a 7-stage enrichment pass:
| Stage | Action |
|---|---|
| 1. MITRE ATT&CK | CWE, CAPEC, technique-level tagging |
| 2. Compliance | OWASP Top 10 · PCI DSS v4 · NIST 800-53 |
| 3. Attack Graph | Dependency-based path analysis |
| 4. Nuclei Templates | Auto-generated YAML for regression testing |
| 5. Knowledge Store | Persistent cross-scan memory updated |
| 6. Notifications | Webhook / Slack alerts for critical findings |
| 7. Reports | JSON + HTML + Markdown output |
CI/CD Integration
# .github/workflows/security.yml
name: Security Scan
on:
push:
branches: [main]
schedule:
- cron: '0 2 * * 1' # Weekly — Monday at 2 AM
jobs:
phantom-scan:
runs-on: ubuntu-latest
steps:
- name: Run Phantom
run: |
pip install phantom-agent
phantom scan \
--target ${{ vars.STAGING_URL }} \
--scan-mode quick \
--non-interactive \
--output json
env:
PHANTOM_LLM: openai/gpt-4o
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
PHANTOM_MAX_COST: "1.00"
Configuration
Environment Variables| Variable | Description | Default |
|---|---|---|
PHANTOM_LLM |
LLM model (LiteLLM format) | openai/gpt-4o |
LLM_API_KEY |
API key — comma-separated for rotation | — |
PHANTOM_REASONING_EFFORT |
low / medium / high |
medium |
PHANTOM_SCAN_MODE |
Default scan profile | standard |
PHANTOM_IMAGE |
Sandbox Docker image | ghcr.io/usta0x001/phantom-sandbox:latest |
PHANTOM_MAX_COST |
Hard stop when total scan cost (USD) reaches this limit | — |
PHANTOM_PER_REQUEST_CEILING |
Hard stop when a single LLM call exceeds this cost in USD | — |
LLM_MAX_TOKENS |
Override max output tokens per LLM call (overrides scan-mode defaults: quick=4000, stealth=6000, default=8000) | — |
PHANTOM_WEBHOOK_URL |
Webhook URL for critical alerts | — |
PHANTOM_DISABLE_BROWSER |
Disable Playwright browser | false |
PHANTOM_TELEMETRY |
Enable anonymous usage telemetry | false |
Phantom uses LiteLLM — 100+ providers work out of the box:
| Provider | Example Model | Notes |
|---|---|---|
| OpenAI | openai/gpt-4o |
Best overall quality |
| Anthropic | anthropic/claude-opus-4-5 |
Strong multi-step reasoning |
gemini/gemini-2.5-pro |
Huge context window | |
| Groq | groq/llama-3.3-70b-versatile |
Free tier, very fast |
| DeepSeek | deepseek/deepseek-chat |
Excellent cost efficiency |
| OpenRouter | openrouter/deepseek/deepseek-v3.2 |
Multi-provider routing |
| Ollama | ollama/llama3.1 |
Fully local — no API key required |
| Azure OpenAI | azure/gpt-4o |
Enterprise deployments |
Security Audit
Phantom has undergone extensive adversarial auditing across multiple versions:
| Severity | Identified | Fixed |
|---|---|---|
| Critical | 8 | 8 |
| High | 19 | 19 |
| Medium | 34 | 34 |
| Low | 27 | 27 |
| Total | 88 | 88 |
All 88 identified issues are resolved. See CHANGELOG.md for the full history.
Testing
# Run the full test suite
pytest tests/ -v
# With coverage report
pytest tests/ --cov=phantom --cov-report=html
# Run specific categories
pytest tests/ -m "security"
pytest tests/ -m "integration"
See tests/ for the test suite. Integration and end-to-end tests require a live Docker environment.
Documentation
| Resource | Description |
|---|---|
| Architecture | Deep-dive system design |
| Documentation | Full API and configuration reference |
| Contributing | Development guidelines |
| Changelog | Version history and release notes |
Docker Sandbox — Setup Guide
Phantom runs all offensive tools inside an isolated Docker sandbox container. This section covers setup for fresh installs and custom environments.
Default Sandbox Image
The default image is ghcr.io/usta0x001/phantom-sandbox:latest — a pre-built Kali Linux container with all security tools installed.
Requirements:
- Docker Desktop or Docker Engine installed and running
- The image is pulled automatically on first scan (~13 GB, one-time download)
# Pre-pull the image manually (optional, avoids delay on first scan)
docker pull ghcr.io/usta0x001/phantom-sandbox:latest
Using a Custom Sandbox Image
Override the image via environment variable or config:
# Environment variable
export PHANTOM_IMAGE="ghcr.io/usta0x001/phantom-sandbox:latest"
phantom -t https://target.com
# Or in ~/.phantom/config.yaml
phantom_image: "ghcr.io/usta0x001/phantom-sandbox:latest"
Air-Gapped / Offline Environments
If your environment has no internet access:
# On a machine with internet access — save the image
docker pull ghcr.io/usta0x001/phantom-sandbox:latest
docker save ghcr.io/usta0x001/phantom-sandbox:latest | gzip > phantom-sandbox.tar.gz
# On the air-gapped machine — load it
docker load < phantom-sandbox.tar.gz
# Point Phantom at the loaded image
export PHANTOM_IMAGE="ghcr.io/usta0x001/phantom-sandbox:latest"
Verify Sandbox Is Working
# Quick smoke test — should start a container and exit cleanly
docker run --rm ghcr.io/usta0x001/phantom-sandbox:latest nmap --version
docker run --rm ghcr.io/usta0x001/phantom-sandbox:latest nuclei --version
Contributing
Contributions are welcome. See CONTRIBUTING.md for setup instructions.
- Bugs → Open an issue
- Features → Start a discussion
- PRs → Fork · branch · test · submit
License
Apache License 2.0 — see LICENSE.
Acknowledgements
Built on the shoulders of giants:
LiteLLM · Nuclei · SQLMap · Playwright · Textual · Rich · ffuf · Subfinder · Caido
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found