OpenThomas

Plan on Opus. Run the swarm on Sonnet.

Cut the cost of your agent fleet — without switching agents.

OpenThomas is a tiny local proxy that sits on the wire between your
coding agents and the model providers they call. It watches every
request and, where it can save you money, quietly routes it to a
cheaper model — keeping the expensive model exactly where it earns
its price, and nowhere else.

Its headline trick: making Claude Code Dynamic Workflows affordable.

The problem

Claude Code Dynamic Workflows are incredible and expensive. In
Anthropic's own words:

"Claude dynamically writes orchestration scripts that run tens to
hundreds of parallel subagents in a single session."

"Dynamic workflows can consume substantially more tokens than a
typical Claude Code session."

Every one of those hundreds of subagents inherits the session model —
Opus 4.8 — and there's no built-in knob to make the swarm cheaper.
The planner needs Opus. The two hundred workers grepping files and
running tests do not.

The fix

OpenThomas tells the planner apart from the workers on the wire and
routes only the workers to a cheaper model:

  Claude plans the work ........  claude-opus-4-8   ← untouched
  └─ subagent  #1  grep ........  claude-sonnet-4-6 ← ~5× cheaper
  └─ subagent  #2  edit ........  claude-sonnet-4-6
  └─ subagent  … ×200 ..........  claude-sonnet-4-6

Sonnet is roughly 5× cheaper per token than Opus ($3 / $15 vs
$15 / $75 per million in/out). The workers are the bulk of a workflow's
tokens, so your bill drops by most of that — while the planning,
verification, and final answer you actually read stay on Opus.

The classifier is exact, not a guess. Claude Code's planner always
carries the orchestrator-only Agent tool (it's the thing that spawns
subagents); subagents never do, because they can't nest. Verified
against 672 real calls: 100% of planners kept, 100% of subagents
caught, zero planner calls ever downgraded. Tiny background calls
(the security monitor, title generation) are left alone by a token
floor.

It's on by default the moment Claude Code is wired. Point it elsewhere,
or turn it off, in one click — or one line of ~/.openthomas/routing.json.

Quick start

npm install -g @openthomas/openthomas

openthomas wire     # detect your agents, install the tap, start the daemon
# …use Claude Code exactly as you do now — run a dynamic workflow…
openthomas          # open the dashboard at http://localhost:9877

openthomas wire is reversible: openthomas unwire restores every file
it touched. No accounts, no telemetry, no cloud — your traffic and your
traces stay on your machine. See PRIVACY.md.

What you see

The dashboard answers the three questions that actually matter when a
fleet is burning tokens:

How many agents are running right now, and which.
What each agent is spending — live, per model.
What each task is spending — the whole run, not just one call,
with the cheaper-model swaps and the dollars they saved called out.

And one control: which model each agent uses, including the
subagent-downgrade target. That's the whole product.

$ openthomas list
ID            STARTED              AGENT        STATUS  COST     SERVED
────────────  ───────────────────  ───────────  ──────  ───────  ─────────────────────────
ru_aBc1xYz9   14:23:11             claude-code  done    $0.04    opus-4-8  (planner)
ru_fOj6Ce1H   14:23:11             claude-code  done    $0.009   sonnet-4-6 ← opus-4-8 ↓

Keep your agents — OpenThomas wraps the wire, not the agent

You do not rewrite anything or adopt a framework. OpenThomas
generates the wrapping on the fly for whatever you already run, and
saves money across all of it:

Agent	Auto-wire	Cost-saver
Claude Code	✅	subagent downgrade (Dynamic Workflows) + per-route model routing
OpenClaw	✅	per-route model routing / failover
Hermes	✅	per-route model routing / failover
Codex	✅	per-route model routing
Claude Desktop	✅	per-route model routing
Cursor / Gemini CLI	ⓘ manual	point the base URL at the wire

Because it taps the wire, OpenThomas works on agents it doesn't own
and bills nothing extra — your subagent calls go to the same provider on
the same credentials, just cheaper.

Privacy

OpenThomas runs as a single local daemon. It never phones home, sends no
telemetry, and forwards your agent's traffic only to the provider your
agent already calls — and nowhere else. The one sanctioned outbound call
is a daily version check against the public npm registry, which carries
no data and is disableable (updateCheck: false). The full contract is
in PRIVACY.md.

Status

Free and open source (MIT), entirely. Solo-built and used daily by the
author; tested against real Claude Code, Claude Desktop, OpenClaw,
Codex, and Hermes traffic. Bug reports and PRs welcome.