Claude Code Toolkit

Claude Code Toolkit is a complete agent-driven workflow system for Claude Code and Codex. It gives Claude and Codex domain-specific expertise, step-by-step workflows, and automated quality gates. The result is Claude working like a team of Go, Python, Kubernetes, review, and content specialists instead of one generalist.

How It Works

It starts from the moment you type a request. You don't pick agents, configure workflows, or learn any internal concepts. You just say what you want done.

/do debug this Go test

In Claude Code, the smart router command is /do. In Codex, use $do.

A router reads your intent and selects a Go specialist agent paired with a systematic debugging methodology. The agent creates a branch, gathers evidence before guessing, runs through phased diagnosis, applies a fix, executes tests, reviews its own work, and presents a PR. You describe the problem. The system handles everything else.

This works because the toolkit separates what you know from what the system knows. Agents carry domain expertise (Go idioms, Python conventions, Kubernetes patterns). Skills enforce process methodology (TDD cycles, debugging phases, review waves). Hooks automate quality gates that fire on lifecycle events. Scripts handle deterministic operations where you want predictable output, not LLM judgment. The router ties it all together, classifying requests, selecting the right combination, and dispatching.

The result: consistent, domain-specific output across Go, Python, TypeScript, infrastructure, content, and more. No configuration required. A first-time user and a power user get the same quality results. The power user just understands why.

Built with the Toolkit

A game built entirely by Claude Code using these agents, skills, and pipelines. Every step from design through implementation.

Installation

Requires Claude Code installed and working (claude --version should print a version number). Codex CLI is also supported: the installer mirrors toolkit skills into ~/.codex/skills and toolkit agents into ~/.codex/agents so Codex can use the same skill and agent library (codex --version should print a version number if you want Codex support too).

git clone https://github.com/notque/claude-code-toolkit.git ~/claude-code-toolkit
cd ~/claude-code-toolkit
./install.sh --symlink

The installer links agents, skills, hooks, commands, and scripts into ~/.claude/, where Claude Code loads extensions from. It also mirrors skills into ~/.codex/skills and agents into ~/.codex/agents for Codex. Use --symlink to get updates via git pull, or run without it for a stable copy.

Verify the install with:

python3 ~/.claude/scripts/install-doctor.py check
python3 ~/.claude/scripts/install-doctor.py inventory

If you update the repo later and want Codex to see newly added skills, rerun ./install.sh --symlink from the repo root.

Command entry points:

Claude Code: /do
Codex: $do

Detailed setup: docs/start-here.md

Codex CLI Parity

The toolkit mirrors agents, skills, and a curated subset of hooks into ~/.codex/ so they work with the OpenAI Codex CLI alongside Claude Code. The mirror runs automatically on every install.sh; no flags required.

What mirrors

Agents: every agent under agents/ (and private-agents/ if present) is copied or symlinked to ~/.codex/agents/.
Skills: every skill under skills/, private-skills/, and private-voices/*/skill/ goes to ~/.codex/skills/.
Hooks: a Phase 1 allowlist of 6 hooks goes to ~/.codex/hooks/ with a matching generated ~/.codex/hooks.json. The allowlist covers SessionStart injectors (KAIROS briefing, operator context, team config, rules distill), a Stop recorder (session learning), and a PostToolUse Bash scanner. See scripts/codex-hooks-allowlist.txt.
Feature flag: install.sh sets [features] codex_hooks = true in ~/.codex/config.toml using a TOML merge that preserves existing sections.

What does not mirror (yet)

Edit/Write interceptors (ADR enforcement, creation protocol, config protection, plan gate, rename sweep, and similar) are Phase 2 and blocked on upstream openai/codex#16732. Codex PreToolUse and PostToolUse only fire for the Bash tool today, so any hook guarding Write or Edit would register but never run.
Codex does not support PreCompact, SubagentStop, Notification, or SessionEnd events; hooks on those events stay Claude Code only.
Windows: Codex hook support is disabled upstream.
Codex CLI v0.114.0 or later is required for hook activation. install.sh warns if the installed version is below that, and prints a neutral note if the codex binary is not found.

Opting out

There is no opt-out flag. The mirror is harmless when Codex CLI is not installed: ~/.codex/ entries sit unused until you install Codex. To skip the hooks portion, delete ~/.codex/hooks/ and ~/.codex/hooks.json after install; the toolkit does not recreate them on normal use.

Reference: adr/182-codex-hooks-mirror.md. Upstream Phase 2 tracker: openai/codex#16732.

Running Claude Code with the toolkit

The toolkit already supplies routing (/do), domain knowledge (agents), methodology (skills), and enforcement (hooks, CLAUDE.md). The shipped Claude Code system prompt, which is several thousand tokens, largely duplicates that structure for toolkit users. Override it to shrink per-request token cost:

claude --system-prompt "."

The . is just a non-empty placeholder (some shells reject an empty string). This is a token-economy and architectural-fit pattern, not a claim about output quality.

Trade-off: overriding the default strips Claude Code's built-in tool-use instructions and style guidance. That context comes back through the toolkit's own agents, skills, hooks, and CLAUDE.md, which is why the pattern fits here. On a bare Claude Code install without the toolkit, use --append-system-prompt "..." instead so the default guidance stays in place.

The Core Workflow

Routing. You type a request. The router entry point is /do in Claude Code and $do in Codex. It classifies intent, selects a domain agent and a workflow skill, and dispatches. No menus, no configuration.
Planning. For non-trivial work, the system creates a plan before touching code. Plans have phases, gates, and saved artifacts at each step.
Execution. A domain-specific agent handles the work using its skill's methodology. Go work gets Go idioms. Python work gets Python conventions. Reviews get multi-wave specialist panels.
Quality gates. Hooks fire automatically: anti-rationalization checks on code modifications, error learning after failures, context injection at session start. Quality is structural, not advisory.
Verification. Tests run. Deterministic scripts validate what LLM judgment cannot. The system does not claim completion without evidence.
Delivery. Changes land on a feature branch. PRs include lint checks, test runs, and review gates. Nothing merges without CI passing.

What's Inside

43 Domain Agents

Agents carry domain-specific expertise. Not thin wrappers that say "you are an expert," but concrete knowledge: version-specific idiom tables, anti-pattern catalogs with detection commands, error-to-fix mappings from real incidents.

Software Engineering

Go, Python, TypeScript, PHP, Kotlin, Swift, Node.js, React Native
Database design, data pipelines, SQLite/Peewee ORM
Kubernetes, Ansible, Prometheus/Grafana, OpenSearch, RabbitMQ, OpenStack

Code Review

Multi-perspective review (newcomer, senior, pedant, contrarian, user advocate)
Domain-specific review (ADR compliance, business logic, structural)
Playbook-enhanced review with adversarial verification

Frontend & Creative

React portfolios, Next.js e-commerce, UI/UX design
PixiJS combat rendering, Rive skeletal animation, combat visual effects
Performance optimization, TypeScript debugging

Infrastructure

Pipeline orchestration, project coordination, research coordination
System upgrades, toolkit governance, technical documentation
MCP server development, Perses observability platform

110 Workflow Skills

Skills enforce methodology. They are phased workflows with gates that prevent skipping steps.

Development Workflows: test-driven development, systematic debugging, feature lifecycle, subagent-driven development, pair programming

Code Quality: parallel code review (3 simultaneous reviewers), systematic code review, code cleanup, universal quality gates, linting

Content & Research: voice-validated writing, research pipelines, content calendars, SEO optimization, topic brainstorming, content repurposing

Operations: PR workflow with quality gates, git commit flow, GitHub Actions checks, cron job auditing, service health checks, Kubernetes debugging

Meta: skill evaluation, agent comparison, A/B testing, toolkit evolution, reference enrichment, routing table management

72 Hooks

Event-driven automation that fires on session start, before/after tool use, at compaction, and on stop. Error learning, context injection, quality enforcement, and anti-rationalization all run automatically.

110 Scripts

Deterministic Python utilities for mechanical operations: INDEX generation, learning database management, voice validation, routing manifests, reference validation. LLMs orchestrate; programs execute.

Choose Your Path

I just want to use it | Install in 2 minutes, learn a few commands. Done.

I do knowledge work | Content pipelines, research workflows, community moderation. No code required.

I'm a developer | Architecture, extension points, how to add your own agents and skills.

I'm an AI power user | Routing tables, pipeline architecture, hook system, the learning database.

I'm an AI agent | Machine-dense component inventory. Tables, file paths, schemas, routing rules.

I'm on LinkedIn | 🚀 Thought leadership. Agree? 👇

Philosophy

The toolkit is built on tested principles, not aspirations. Key ideas:

Zero-expertise operation. The system requires no specialized knowledge from the user. Say what you want done. The system handles the rest.
LLMs orchestrate, programs execute. If a process is deterministic and measurable, use a script. Reserve LLM judgment for design decisions and contextual diagnosis.
Tokens are cheap, quality is expensive. Dispatch parallel review agents, run validation scripts, create plans before executing. Quality gates run on every invocation regardless of token budget.
Agents carry the knowledge, not the model. Agent quality is proportional to the specificity of attached knowledge, not the confidence of attached tone. We tested this empirically with A/B experiments.
Anti-rationalization as infrastructure. The toolkit enforces verification structurally. Quality gates are built into the pipeline, not left to discipline.
Everything should be a pipeline. Complex work decomposes into phases. Phases have gates. Gates prevent cascading failures.

Read the full design philosophy: PHILOSOPHY.md

License

MIT. See LICENSE.

vexjoy-agent