🐙 Claude Octopus

Name: claude-octopus
Author: nyldn

Every AI model has blind spots. Claude Octopus puts up to eight of them on every task, so blind spots surface before you ship — not after. It orchestrates Codex, Gemini, Copilot, Qwen, Ollama, Perplexity, and OpenRouter alongside Claude Code, with consensus gates that flag any disagreements.

Claude Octopus Demo — debate and research with multiple AI providers

🐙 Research, build, review, and ship — with eight AI providers checking each other's work. Say what you need, and the right workflow runs. A 75% consensus gate catches disagreements before they reach production. No single model's blind spots slip through.

🧠 Remembers across sessions. Integrates with claude-mem for persistent memory — past decisions, research, and context survive session boundaries.

⚡ Spec in, software out. Dark Factory mode takes a spec and autonomously runs the full pipeline — research, define, develop, deliver. You review the output, not every step.

🔄 Four-phase methodology, not just tools. Every task moves through Discover → Define → Develop → Deliver, with quality gates between phases. Other orchestrators give you infrastructure. Octopus gives you the workflows.

🐙 32 specialized personas (role-specific AI agents like security-auditor, backend-architect), 47 commands (slash commands you type), 50 skills (reusable workflow modules). Say "audit my API" and the right expert activates. Don't know the command? The smart router figures it out.

🐙 Works with just Claude. Scales to eight. Zero providers needed to start. Add them one at a time — each activates automatically when detected.

💰 Five providers cost nothing extra. Codex and Gemini use OAuth (included with subscriptions). Qwen has 1,000-2,000 free requests/day. Copilot uses your GitHub subscription. Ollama runs locally for free.

What's New

Version	Best Features
v9 (current)	Up to 8 providers (Codex, Gemini, Copilot, Qwen, Ollama, Perplexity, OpenRouter, OpenCode). Four-way AI debates. Smart router — just say what you need. Circuit breakers with automatic provider recovery. Loop self-regulation stops runaway agents. HUD statusline with tool tracking. Codex CLI cross-compatibility. Cache-aligned prompts for 90% cost savings on repeated calls.
v8	Multi-LLM code review with inline PR comments. Parallel workstreams in isolated git worktrees. Reaction engine — auto-responds to CI failures. 32 specialized personas. Dark Factory autonomous pipeline.
v7	Double Diamond workflow. Multi-provider dispatch. Quality gates and consensus scoring. Configurable sandbox modes.

Full changelog →

Quickstart

# Terminal (not inside a Claude Code session):
claude plugin marketplace add https://github.com/nyldn/claude-octopus.git
claude plugin install octo@nyldn-plugins

# Then inside Claude Code:
/octo:setup

That's it. Setup detects installed providers, shows what's missing, and walks you through configuration. You need zero external providers to start — Claude is built in.

Alternative install methods

From the Claude Code UI: Type /plugin in a session → Marketplace tab → install octo.

Factory AI (Droid):

droid plugin marketplace add https://github.com/nyldn/claude-octopus
droid plugin install octo@claude-octopus

Update / Troubleshooting

# Update
claude plugin update octo

# Clean reinstall (if update fails)
claude plugin uninstall claude-octopus 2>/dev/null
claude plugin uninstall octo 2>/dev/null
rm -rf ~/.claude/plugins/cache/nyldn-plugins/claude-octopus
claude plugin marketplace add https://github.com/nyldn/claude-octopus.git
claude plugin install octo@nyldn-plugins

8 Commands That Matter Most

🐙 Eight commands — one per arm. A real octopus has eight arms, each with its own neurons that can act independently. These eight tentacles work the same way: each orchestrates up to three AI providers, applies quality gates, and produces a deliverable.

/octo:embrace build stripe integration     # Full lifecycle: research → define → develop → deliver
/octo:factory "build a CLI that converts CSV to JSON"  # Autonomous pipeline — spec in, software out
/octo:debate monorepo vs microservices     # Structured four-way AI debate with consensus
/octo:research htmx vs react in 2026       # Multi-source synthesis from three AI providers
/octo:design mobile checkout redesign       # UI/UX design with BM25 style intelligence
/octo:tdd create user auth                 # Red-green-refactor with test discipline
/octo:security                              # OWASP vulnerability scan + remediation
/octo:prd mobile checkout redesign          # AI-optimized PRD with 100-point scoring

Plus 30 more: review, debug, extract, deck, docs, schedule, parallel, sentinel, brainstorm, claw, doctor, and the full set.

Don't remember the command name? Just describe what you need:

/octo:auto research microservices patterns    -> routes to discover phase
/octo:auto build user authentication          -> routes to develop phase
/octo:auto compare Redis vs DynamoDB          -> routes to debate

The smart router parses your intent and selects the right workflow.

Pick a Command by Goal

Not sure which command to use? Pick by goal:

I want to...	Use
Research a topic thoroughly	`/octo:research` or `/octo:discover`
Debate two approaches	`/octo:debate`
Build a feature end-to-end	`/octo:embrace`
Design a UI or style system	`/octo:design`
Review existing code	`/octo:review`
Write tests first, then code	`/octo:tdd`
Scan for vulnerabilities	`/octo:security`
Write a product spec	`/octo:prd`
Go from spec to shipping code	`/octo:factory`
Debug a tricky issue	`/octo:debug`
Just run something quick	`/octo:quick`

Or skip the table — type /octo:auto <what you want> or just say octo <what you want>, and the smart router picks for you. 🔍

How does this compare to Superpowers or plain Claude Code?

	Claude Code alone	Superpowers	Claude Octopus
Core idea	One model, your prompts	Structured methodology for one agent	Up to 8 providers cross-checking each other
Providers	Claude only	Claude only	Codex, Gemini, Copilot, Qwen, Ollama, Perplexity, OpenRouter, OpenCode
Workflow	Ad-hoc	Spec → plan → subagent-driven dev	Discover → Define → Develop → Deliver (Double Diamond)
Strength	Simple, no setup	Long autonomous runs with discipline	Multiple perspectives catching blind spots
Consensus gates	No	No	Yes — 75% agreement threshold
Best for	Quick tasks, simple features	Large builds with clear specs	Research, review, debates, multi-provider validation
Setup	Nothing	Install plugin	Install plugin, optionally add providers

tl;dr: Superpowers makes one agent work really well for hours. Octopus makes multiple agents check each other's work. They solve different problems.

How It Works

How 8 Providers Work Together

Claude Octopus coordinates up to eight AI providers — one per tentacle:

Provider	Role
🔴 Codex (OpenAI)	Implementation depth — code patterns, technical analysis, architecture
🟡 Gemini (Google)	Ecosystem breadth — alternatives, security review, research synthesis
🟣 Perplexity	Live web search — CVE lookups, dependency research, current docs
🌐 OpenRouter	Alternative model routing — access 100+ models via single API
🟢 Copilot (GitHub)	Zero-cost research — uses existing GitHub Copilot subscription
🟤 Qwen (Alibaba)	Free-tier research — 1,000-2,000 requests/day via Qwen OAuth
⚫ Ollama (Local)	Zero-cost local LLM — offline, privacy-sensitive, fallback
🔵 Claude (Anthropic)	Orchestration — quality gates, consensus building, final synthesis

Providers run in parallel for research, sequentially for problem scoping, and adversarially for review. A 75% consensus quality gate prevents questionable work from shipping. Only Claude is required — all others are optional and auto-detected.

Four Phases: Discover, Define, Develop, Deliver

Four structured phases adapted from the UK Design Council's methodology:

Phase	Command	What happens
Discover	`/octo:discover`	Multi-AI research and broad exploration
Define	`/octo:define`	Requirements clarification with consensus
Develop	`/octo:develop`	Implementation with quality gates
Deliver	`/octo:deliver`	Adversarial review and go/no-go scoring

Run phases individually or all four with /octo:embrace. Configure autonomy: supervised (approve each phase), semi-autonomous (intervene on failures), or autonomous (run all four).

32 Specialist Personas

Specialized agents that activate automatically based on your request. When you say "audit my API for vulnerabilities," security-auditor activates. When you say "design a dashboard," ui-ux-designer takes over.

Categories: Software Engineering (11), Specialized Development (6), Documentation & Communication (5), Research & Strategy (3), Business & Compliance (3), Creative & Design (4).

Full persona reference | All 50 skills

Built-in Reaction Engine

When agents create PRs, the reaction engine monitors what happens next — CI failures, review comments, stale agents — and responds automatically. No new commands to learn. It fires transparently inside workflows you already use:

Integration Point	When It Fires
`/octo:parallel`	Between poll cycles while monitoring work packages
`/octo:sentinel`	After triage scan completes
`agent-registry.sh health --react`	On-demand health check

What it auto-handles:

Event	Reaction	Limits
CI failure	Collects failure logs into agent inbox	3 retries, escalates after 30m
Changes requested	Collects review comments into agent inbox	2 retries, escalates after 60m
Agent stuck	Escalates to human	After 15m with no progress
PR approved + CI green	Notifies you it's ready to merge	—
PR merged	Marks agent complete	—

Override defaults per project by creating .octo/reactions.conf:

# EVENT|ACTION|MAX_RETRIES|ESCALATE_AFTER_MIN|ENABLED
ci_failed|forward_logs|5|45|true
changes_requested|forward_comments|3|90|true
stuck|escalate|0|10|true

Reactions track 13 agent lifecycle states: running → pr_open → ci_pending → ci_failed / review_pending → changes_requested / approved → mergeable → merged → done.

Providers and What They Cost

Authentication

Method	Codex	Gemini	Claude
OAuth (recommended)	`codex login` — included in ChatGPT subscription	Google account — included in AI subscription	Built into Claude Code
API key	`OPENAI_API_KEY` — per-token billing	`GEMINI_API_KEY` — per-token billing	Built into Claude Code

OAuth users pay nothing beyond their existing subscriptions.

What You Get With Just Claude

Everything except multi-AI features. You get all 32 personas, structured workflows, smart routing, context detection, and every skill. Multi-AI orchestration (parallel analysis, debate, consensus) activates when external providers are configured.

Trust, Safety, and Limits

Namespace isolation — Only /octo:* commands and octo natural language prefix activate the plugin. Your existing Claude Code setup is untouched.

Data locations — Results in ~/.claude-octopus/results/, logs in ~/.claude-octopus/logs/, project state in .octo/. Nothing hidden.

Provider transparency — Every command shows a 🐙 activation indicator on launch. Colored dots (🔴 🟡 🟣 🔵) show exactly which providers are running and when external APIs are called. You always know what's happening.

Clean uninstall — Run claude plugin uninstall octo from your terminal. If you see a scope error, add --scope project. No residual config changes.

Works With OpenClaw

Claude Octopus ships with a compatibility layer for OpenClaw, the open-source AI assistant framework. This lets you expose Octopus workflows to messaging platforms (Telegram, Discord, Signal, WhatsApp) without modifying the Claude Code plugin.

Architecture

Claude Code Plugin (unchanged)
  └── .mcp.json ─── MCP Server ─── orchestrate.sh
                                        ↑
OpenClaw Extension ─────────────────────┘

Three components, zero changes to the core plugin:

Component	Location	Purpose
MCP Server	`mcp-server/`	Exposes 10 Octopus tools via Model Context Protocol
OpenClaw Extension	`openclaw/`	Wraps workflows for OpenClaw's extension API
Skill Schema	`mcp-server/src/schema/skill-schema.json`	Universal skill metadata format

MCP Server

The MCP server auto-starts when the plugin is enabled (via .mcp.json). It exposes:

octopus_discover, octopus_define, octopus_develop, octopus_deliver — Individual phases
octopus_embrace — Full Double Diamond workflow
octopus_debate, octopus_review, octopus_security — Specialized workflows
octopus_list_skills, octopus_status — Introspection

Any MCP-compatible client can connect to the server.

OpenClaw Extension

Install in an OpenClaw instance from git:

npm install github:nyldn/claude-octopus#main --prefix openclaw

Or clone and link locally:

cd openclaw && npm install && npm run build

The extension registers as an OpenClaw plugin with configurable workflows, autonomy modes, and Claude Code path resolution.

Build & Validate

./scripts/build-openclaw.sh          # Regenerate skill registry from frontmatter
./scripts/build-openclaw.sh --check  # CI mode — exits non-zero if out of sync
./tests/validate-openclaw.sh         # 13-check validation suite

FAQ

Do I need all three AI providers?
No. One external provider plus Claude gives you multi-AI features. No external providers still gives you personas, workflows, and skills.

Will this break my existing Claude Code setup?
No. Activates only with the octo prefix. Results stored separately. Uninstalls cleanly.

What happens if a provider times out?
The workflow continues with available providers. You'll see the status in the visual indicators.

Why "octopus"?
🐙 Fun fact: a real octopus has three hearts, blue blood, and 500 million neurons — two-thirds of which live in its eight arms. Each arm can taste, touch, and act independently. Claude Octopus works the same way: each tentacle (command) operates autonomously with its own squeeze of logic, then ink flows back as the final deliverable. The crossfire review? That's the squeeze — adversarial pressure that untangles everything before it ships.

Community

Join r/ClaudeOctopus for help, workflow tips, showcases, and updates.

Contributing

Report issues
Submit PRs following existing code style
git clone https://github.com/nyldn/claude-octopus.git && make test

See CONTRIBUTING.md for details.

Documentation

Documentation Guide — Start here
Command Reference — Commands, triggers, and provider indicators
Feature Gap Analysis — CC feature adoption tracker
Architecture — Provider flow and execution model
Plugin Architecture — Internal plugin structure
Agents & Personas — All 32 personas
CLI Reference — Direct CLI usage, debug mode, async, and tmux
Changelog

Attribution

wolverin0/claude-skills — AI Debate Hub. MIT License.
obra/superpowers — Discipline skills patterns. MIT License.
nextlevelbuilder/ui-ux-pro-max-skill — BM25 design intelligence databases. MIT License.
UK Design Council — Double Diamond methodology.

License

MIT — see LICENSE

nyldn | MIT License | r/ClaudeOctopus | Report Issues