🧩 multi-model-team

Let Claude Code delegate the grunt work to Gemini & Codex — and keep the hard thinking for itself.

Multi-model orchestration for Claude Code. Route by task, fan out in parallel, fall back gracefully.

A Claude Code plugin that offloads token-heavy, self-contained tasks to local pre-authed CLI
backends — agy (Gemini) and codex (OpenAI Codex CLI) — picking the backend and model by
task size and type, with credit-exhaustion fallback through the chain to native Claude, and a
glanceable statusline HUD.

The core idea:

Offload commodity work (UI/components, scaffolding, CRUD, scripts, SQL, configs, unit tests,
web research, bulk summarization) to a fast/cheap CLI — keep judgment-heavy and systems-hard
work (reverse-engineering, FFI/unsafe, injection, concurrency, protocol design) on Claude.
Every routing decision is config-driven; tune it without touching code.

agy, codex, and native Claude are equal, configurable tools. /team decomposes a task and
assigns each subtask to its best-fit backend; /reasoning fans one question across a panel of all
three and fuses the answers.

⚡ Quick Start

1 · Install the backends (one-time, pre-auth each)

npm install -g node-pty       # the one native dep — gives agy a pseudo-terminal (see note below)
npm install -g @openai/codex  # then: codex login

# Windows Powershell
irm https://antigravity.google/cli/install.ps1 | iex         # then: agy login

# macOS / Linux
curl -fsSL https://antigravity.google/cli/install.sh | bash  # then: agy login

2 · Add the plugin. This repo is the plugin — point Claude Code at it as a local plugin (local
marketplace or --plugin-dir). On enable, Claude Code auto-discovers commands/, agents/, and
hooks/hooks.json. Nothing else to wire up.

3 · (Optional) Turn on the HUD. Add a statusLine to your own ~/.claude/settings.json
(the plugin can't register one for you) — see Statusline HUD.

4 · Use it.

/reasoning  2:gemini,opus,codex   What's the best caching strategy for a read-heavy API?
/team       3:gemini,1:codex      Build a REST CRUD service with tests
/route-test                       Write a SQL query to list users by signup date   ← dry-run, no call

…or just work normally and let Claude reach for the agy / codex agents on its own.

🎛️ Commands

Command	What it does
`/reasoning [panel] <question>`	Fusion pipeline. Fan one question across a panel of models in parallel → a judge compares them (consensus / contradictions / unique insights / blind spots) → synthesize one unified answer better than any single model's.
`/team [--writable] [N:gemini,M:claude,X:codex] <task>`	Team pipeline. Decompose → dispatch each subtask to its best-fit backend (dependency-aware waves) → verify each result → bounded fix loop → synthesize. Add `--writable` to let agents actually edit code in isolated git worktrees (see below).
`/route-test <task>`	Dry-run the router: prints `{backend, model, tier}`, detected types, matched rule. No backend call — a tuning tool.

Both /team and /reasoning have two engines: an Ultracode deterministic Workflow path
(preferred, when the Workflow tool is available) and a parallel Task-agent fallback. Either way the
work runs across parallel agents — never one inline session.

`/team` modes — read-only (default) vs `--writable`

/team runs in one of two modes:

read-only (default): the CLI agents (agy/codex) stay read-only — they return text, not edits.
Any file changes are applied by Claude (the orchestrator) directly to your current branch. No
branch, no worktree, no PR is created. This is the back-compat behaviour.
writable (--writable): each subtask gets its own git worktree + branch off your current
HEAD; the assigned agent makes real file changes there (CLI backends run full-auto in the
worktree). The orchestrator then merges every subtask branch into one integration branch
mmt/team-<slug> (off HEAD) and resolves any merge conflicts itself — reading both sides and
editing to a correct combined result, then completing the merge — so you get one finished,
conflict-free branch, not a pile of worktrees to reconcile. Your current branch is never touched
(no auto-merge onto it) and no GitHub PR is created — you merge / open a PR for the integration
branch when ready (git log mmt/team-<slug>). Only a conflict the orchestrator genuinely can't
reconcile is left for you (reported as unresolved, its worktree kept). Per-subtask worktrees live
under .mmt/worktrees/ (gitignored). The full-auto sandbox per backend is tunable via
writable_extra in roster.json. Enable per-invocation with --writable, or set
team.mode: "writable" in the roster.

Agents (Claude spawns these on its own for matching work)

Each is a dispatcher for its CLI backend — a configurable, equal tool, not a fixed task bucket.
Where work routes is decided by config/roster.json (routes + tags.txt) and per-/team
assignments, so the "default lane" below is tunable roster policy, not a hard limit.

Agent	Default lane (per shipped config)	Backend
`agy`	Commodity, easily-verifiable work + Gemini's edges — UI/CSS, scaffolding, CRUD, scripts, SQL, regex, configs, tests, data transforms, web-research/summarization, audio/video	agy
`codex`	Code review, test-writing, verification (and the default `/team` verifier; writes code full-auto under `--writable`)	codex

The shipped routing keeps RE/injection/systems-hard work native by default — that's roster policy
you can retune, not a property of the agents (there's intentionally no RE/injection agent). An explicit
agent spawn is honored as-is (forces that backend; the router's hard line won't bounce it).

🚦 How routing works

src/bin/route.mjs scores the task (char count + keyword types from config/tags.txt), then matches
routes rules in the roster (first match wins; order encodes priority). src/bin/run.mjs runs the
chosen backend with a fallback chain, writes HUD state, and cleans output.

→ agy / codex (CLI)	→ Sonnet (judgment)	→ Opus (hard line)
New components, CSS, UI, SVG/anim	Refactoring existing code	RE, IL2CPP, protobuf-RE
Boilerplate, scaffold, CRUD, REST	Cross-module integration	disasm, decompile, VMProtect
Scripts, CLI tools, glue code	Bugfixes needing root-cause	DLL injection, Detours/MinHook
SQL, regex, configs, Dockerfiles	API / data-model design	FFI, unsafe, shellcode, kernel
Fixtures, data transforms, codegen	Production logic, edge cases	concurrency, lock-free, KCP
Web search, doc/research summary	Anything hard to verify	protocol design, proc-macros
Video/audio (Claude can't anyway)	Unclassified / uncertain	(size-irrelevant — always Opus)

Within the CLI lane: code review, test-writing, and verification → codex; the rest of the
commodity work → agy. A judgment word (refactor, bugfix) still wins → Sonnet; the hard line
still → Opus. Default fallback chain: agy → codex → native.

Presets (defaults.preset, or --preset): budget pushes borderline judgment-coding to a CLI;
premium pulls standard-coding up to Sonnet; balanced is the default.

⚙️ Configuration

All config lives in one JSON file, and resolution is file-based (no env var) — drop a file
in the right place and every entry point picks it up automatically, so plugin updates never clobber
your tuning. Run /mmt-setup to scaffold your personal roster.

Roster resolution order (highest first):

<cwd>/.mmt/roster.json — project-local roster: per-repo tuning, checked into the project so a
team shares one routing config.
~/.claude/mmt-roster.json — your personal roster across all projects (created by /mmt-setup).
<plugin>/config/roster.json — the shipped default.

Sections (keys prefixed _comment/_about are inline docs the parsers ignore):

Section	Tune to…
`backends`	turn a CLI on/off (`enabled`), pick its invoker (`kind`), and set `writable_extra` — the flags used instead of `extra` in `/team --writable` mode (full-auto). Live: `agy` (`gemini`), `codex`. `opencode` is a stub.
`routes`	change where a task type routes (first match wins).
`agents`	the delegation subagents (`backend`/`tier`/`dispatch`/`role`). After editing, run `node src/lib/gen-agents.mjs` to regenerate `agents/*.md`.
`team`	the `/team` pipeline roles + defaults — `dispatch_backends`, `verifier`, `caps`, `tier_models`, `verify`, `max_fix_loops`, and `mode` (`"writable"` makes `--writable` the default; per-invocation `--writable` still wins).
`reasoning`	the `/reasoning` Fusion defaults — `panel` (which models participate), `judge`, `synthesizer`, `cap`. See docs/REASONING.md.
`defaults` / `proactive`	preset + fallback chain, and the proactive-nudge config.
`config/tags.txt`	(separate flat file) keyword → task-type classification.

Routing changes need no code edit — verify with /route-test. Adding a future backend: add
invoke/health cases in src/lib/backends.mjs and flip enabled.

Proactive delegation (opt-in, off by default)

Two config-gated hooks make Claude reach for a backend on its own instead of waiting for you to ask:

Prompt nudge (UserPromptSubmit) — when a prompt would route to a CLI backend, injects a
one-shot reminder to delegate instead of solving inline.
Spawn guard (PreToolUse on Task/Agent) — when Claude spawns an agent whose task routes
to agy/codex, makes that work actually run on the CLI (nudge by default; hard block under
enforce_spawns). Your /team workers and the plugin's own subagents are exempt; oh-my-claudecode
team workers are always nudged, never denied, so they never stall.

"proactive": {
  "enabled": true,          // master switch for BOTH hooks (default false)
  "max_chars": 0, "min_chars": 0,  // size window (0 = unbounded)
  "rules": "",              // CSV allowlist of route names; empty = any CLI route
  "guard_spawns": true,     // (2) intercept agy/codex-routable Task/Agent spawns
  "enforce_spawns": false   // (2) false = nudge; true = hard-deny + require CLI re-dispatch
}

Slash commands and native-routing work are never touched. Disabled → both hooks exit immediately
(zero forks). Hard kill switch: MMT_PROACTIVE_DISABLE=1.

📺 Statusline HUD

A plugin-bundled settings.json does not register a top-level statusLine. The shipped
settings.json is a reference — to get the HUD, add this to your own ~/.claude/settings.json
with the absolute path to this plugin:

{
  "statusLine": {
    "type": "command",
    "command": "node \"C:/Users/you/path/to/multi-model-team/statusline/statusline.mjs\""
  }
}

⟳ agy·Gemini-3.1-Pro │ 2 open │ ~12k↓             (active delegation)
◦ agy idle │ 5 calls · 1 fallback │ last 3.4s ✓    (idle)
◦ mmt idle                                         (no calls yet)

Token totals are char estimates (prefixed ~) — agy emits no usage line. If it can't read state, it
prints ◦ mmt idle.

🔌 Backend quirks worth knowing

agy needs a TTY — provided by `node-pty`

agy gates output on isatty(stdout): through a plain pipe it exits 0 and prints nothing — a
silent no-op that looks like success. The plugin runs every agy call under a real pseudo-terminal
via node-pty (ConPTY on Windows 10/11, forkpty on
Linux/macOS), so isatty is true and agy emits — with no visible console window, working even from
a fully headless parent (a Bash-tool call, a hook, a /team or /reasoning sub-agent). The prompt
rides as a real argv element (no shell — injection-safe).

node-pty resolution: npm install -g node-pty once and it resolves across every plugin update
via a NODE_PATH shim (the trick oh-my-claudecode uses); or npm install locally (re-run per
update). Required on Windows (ConPTY). Optional on Linux/macOS — the agy lane falls back to
the system script utility; if neither exists, agy degrades to the codex/native fallback with an
install hint.

codex is non-interactive — no TTY needed

codex is invoked as codex exec <flags> with the prompt delivered via stdin (fixes a Windows
bug where the npm .cmd shim truncated multi-line prompts at the first newline). resolveBinary
prefers a PATHEXT match (codex.cmd) over the extensionless shim. No pty needed.

📋 Requirements

Node.js ≥ 18 — runtime for all plugin scripts.
node-pty — the one native dep (agy's pseudo-terminal). Prebuilt binaries cover common
Node/OS/arch combos. Required on Windows; optional on POSIX (see note above).
agy (Antigravity CLI, optional) — installed and pre-authed. Auto-resolved from $MMT_AGY_BIN → PATH →
$LOCALAPPDATA/agy/bin/agy.exe (Windows) or ~/.local/bin/agy / /usr/local/bin/agy (POSIX).
codex (Codex CLI, optional) — npm install -g @openai/codex + login. If absent, tasks fall through the chain.

Built and verified against agy v1.0.8 and codex-cli 0.139.0 on Windows, and tested on
Linux/macOS — the POSIX paths are exercised on a real POSIX box.

🗂️ Layout

.claude-plugin/plugin.json   plugin manifest
config/roster.json           shipped default config (override at ~/.claude/mmt-roster.json)
config/tags.txt              task-type classifier (editable flat file)
src/lib/platform.mjs         cross-platform OS layer: PTY wrap, binary + roster resolve, state dir
src/lib/config.mjs           roster loader → plain JS objects
src/lib/score.mjs            char count + keyword type classification
src/lib/router.mjs           first-match-wins decision engine
src/lib/backends.mjs         agy/codex invokers + clean() + quota detection
src/lib/state.mjs            HUD state read/write
src/lib/hook-common.mjs      shared hook runtime (one fork-free node process per hook)
src/lib/team-spec.mjs        /team cap-spec parser
src/lib/team-plan.mjs        plan.json → per-subtask files
src/lib/reason-spec.mjs      /reasoning panel-spec parser
src/lib/gen-agents.mjs       regenerate agents/*.md from the roster
src/lib/validate-config.mjs  roster.json validator (route names, tiers, backends, agents)
src/bin/route.mjs            task → decision JSON CLI
src/bin/run.mjs              executor + fallback chain + HUD state (file relay transport: --call-file)
src/bin/team.mjs             scripted CLI fan-out for /team
src/bin/reason.mjs           scripted panel fan-out for /reasoning
hooks/proactive-route.mjs    UserPromptSubmit delegation nudge (opt-in)
hooks/spawn-route-guard.mjs  PreToolUse(Task|Agent) guard — CLI-routable spawns (opt-in)
hooks/command-fanout-guard.mjs  UserPromptSubmit guard — forces /reasoning & /team into the engine
hooks/hooks.json             hook registrations
statusline/statusline.mjs    fork-free HUD line
agents/                      agy, codex (GENERATED)
commands/                    reasoning, team, route-test
workflows/team.mjs           Ultracode team workflow
workflows/reasoning.mjs      Ultracode Fusion workflow: Panel → Judge → Synthesize
test/*.test.mjs              offline test suite
docs/REASONING.md            design contract for the /reasoning Fusion pipeline
docs/INTERFACES.md           module interface contract (Node ESM port signatures)

🧪 Testing

npm test                # offline: 99/99 routing + unit tests (no backend calls)

The suite is fully offline — no backend calls. Live agy/codex behaviour is verified by hand (run a
real node src/bin/run.mjs --call-file=… against the installed CLIs), not by a npm test gate.

Why Node ESM? The original bash hooks forked ~6–7 processes per invocation under a 10 s msys
timeout and were intermittently killed ("hooks not triggering sometimes"). Each hook is now one
fork-free Node process — read payload, gate with real JSON.parse, route in-process, emit.

🔧 Env overrides

Var	Purpose
`MMT_AGY_BIN` / `MMT_CODEX_BIN`	explicit path to the agy / codex binary
`MMT_TAGS`	alternate `tags.txt`
`MMT_STATE_DIR` / `MMT_STATE_FILE`	HUD state location
`MMT_PROACTIVE_DISABLE`	`=1` hard-disables both proactive hooks
`MMT_HOOK_DISABLE`	`=1` disables all hooks
`MMT_COMMAND_GUARD_DISABLE`	`=1` disables just the `/reasoning`·`/team` engine guard
`MMT_HOOK_DEBUG`	`=1` appends firing markers to `stateDir/hooks.log`

🐛 Known open items

Quota grounding (P2): quota_patterns are sensible defaults; detection is failure-gated (a
successful call is never read as exhaustion). Harden on the first real credit-exhaustion error.
Linux/macOS: POSIX PTY shim (script) and XDG state dir are exercised and tested on a real
POSIX box.

License

MIT