ultracost

Per-stage model routing for Claude Code dynamic workflows.

Stop a single ultracode fan-out from running 40 subagents on Opus by accident.

About

ultracost keeps Claude Code's ultracode mode from silently running every
subagent on Opus. When ultracode is on, the session is pinned to Opus @ xhigh
and a single dynamic workflow fans out to dozens of subagents that inherit that
session model unless every stage is pinned. ultracost makes the per-stage routing
explicit, injects the policy at the start of every session, and ships a guard that
fails any unpinned stage.

Built for ultracode (Opus @ xhigh dynamic workflows) — that is the only place
the multi-agent fan-out it guards against happens. ultracost routes by tier
(opus/sonnet), not a pinned version, so it tracks whatever Opus your session runs.

No telemetry. No network on the hot path. MIT.

Security & trust. ultracost has zero runtime and dev dependencies, so there is no
supply chain to compromise — Snyk Open Source and npm audit report 0 vulnerabilities.
Releases publish to npm with OIDC Trusted Publishing and signed provenance, every
GitHub Action is pinned to a commit SHA, and CodeQL + OpenSSF Scorecard run in CI.
The installer touches only its own files and is fully reversible. See SECURITY.md.

Setup (Claude Code plugin):

/plugin marketplace add danielkremen818/ultracost
/plugin install ultracost@ultracost

Or via the npm CLI (CI / scripting):

npx ultracost init

First command (in Claude Code): /ultracost:check ./path/to/workflow.js — flag any agent() stage that would silently inherit Opus. The plugin ships a slash command for every verb (/ultracost:check · estimate · explain · simulate · diff · audit · usage · reconcile · calibrate · ledger · status) — no global binary needed.

Same verbs on the npm CLI (CI / scripting): ultracost init · check · audit · estimate · explain · simulate · diff · usage · reconcile · calibrate · ledger · pricing · status · doctor · uninstall.

ultracost — estimate, check, and audit on a real workflow script

The problem

When ultracode is on, Claude Code runs the session on Opus @ xhigh (Opus is the only model that supports xhigh) and auto-orchestrates dynamic workflows that fan out to dozens — up to 1,000 — subagents. Two defaults compound:

Subagents inherit the session model. No per-stage override → every stage runs on the session's Opus model.
The built-in workflow guidance tells Claude to omit the per-agent model. So inheritance wins.

The documented result: one prompt spawning 46 Opus subagents and ~3M tokens with no warning. A grep sweep and a per-file verifier do not need Opus.

The evidence: nobody pins a stage

This is the default behavior, not user error. In a scan of ~22 real ultracode workflow scripts authored across ~/.claude/projects/**/workflows/scripts/, almost none pinned model: on any stage — every stage inherited the session model (Opus @ xhigh). Even Anthropic's own bundled deep-research workflow pins zero stages. Left to its defaults, Claude Code writes fan-outs that silently run everything on the most expensive model.

You can reproduce this on your own history in one command:

npx ultracost audit ~/.claude/projects

What ultracost does

ultracost makes the routing explicit, policy-driven, and verifiable — without giving up quality on the work that matters.

A quality-first policy. Coding and reasoning stay on Opus @ xhigh. Pre-planned mechanical work and search/collection drop to Sonnet. Haiku is never used. You own the policy in one JSON file.
Always-on routing guidance. As a plugin, a SessionStart hook injects the policy as context at the start of every session (and re-injects it after compaction) — so it is present when Claude authors a workflow, without relying on the model choosing to open a skill. As the npm CLI, the same policy compiles into your ~/.claude/CLAUDE.md. A routing skill ships alongside for explicit /-reference.
The Workflow Guard. A static analyzer that scans the workflow scripts Claude authors and flags any agent() stage missing a model: pin — so a fan-out can't silently inherit Opus. Run it by hand, via /ultracost:check, or in CI. No other tool does this.

Architecture

One shared core in src/, two delivery surfaces: a Claude Code plugin (primary) and an npm CLI (secondary). Both compile from the same policy.json.

The plan lives in data (policy.json), not in prose buried in a prompt. The guard is the enforcement layer the model can't talk its way out of. See docs/architecture.md for the full picture.

Install

Plugin (recommended)

Inside Claude Code:

/plugin marketplace add danielkremen818/ultracost
/plugin install ultracost@ultracost

Then, without leaving Claude Code, drive everything through slash commands — verify the workflow Claude just drafted, estimate it, or reconcile a finished run against what it actually cost:

/ultracost:check ./path/to/workflow.js

The plugin bundles — touching none of your own files — a SessionStart policy-injection hook, a PreToolUse cost gate on the Workflow tool (ULTRACOST_GATE=off to disable), a routing-policy skill, and a slash command for every verb (each runs the bundled CLI via ${CLAUDE_PLUGIN_ROOT}, so there's nothing to install on PATH):

Command	What it does
`/ultracost:check [path]`	Flag `agent()` stages that don't pin a model (or pin the wrong tier). Defaults to the most recent workflow script.
`/ultracost:estimate <script>`	Agent count, model mix, tiered cost vs an all-opus baseline.
`/ultracost:explain <script>`	Per-stage rationale: model, effort, the tier the prompt reads like, est cost, check flags.
`/ultracost:simulate <script>`	Cost under all-opus vs your tiered pins vs all-sonnet.
`/ultracost:diff <a> <b>`	Cost delta between two versions of a script.
`/ultracost:audit [dir]`	Pin stats across your real workflow scripts.
`/ultracost:usage [dir]`	Real token cost from local transcripts (main vs subagents vs stages).
`/ultracost:reconcile [--last\|<id>]`	Estimate vs actual per stage for a finished run.
`/ultracost:calibrate`	Tune the estimator from your real token usage.
`/ultracost:ledger`	Cumulative savings vs all-opus across recorded runs.
`/ultracost:status`	How ultracost is delivered (plugin/cli), the policy, and the bypass caveat.

Requires Claude Code with the /plugin command and dynamic workflows enabled.

npm CLI

npx ultracost init

This writes ~/.claude/ultracost/policy.json, injects the routing block into ~/.claude/CLAUDE.md, installs the re-inject hook (~/.claude/ultracost/reinject.mjs), and registers it on SessionStart in ~/.claude/settings.json. New sessions pick it up immediately. Paths honor CLAUDE_CONFIG_DIR if you've relocated your config. Requires Node ≥ 24.

Then verify a workflow script at any time:

ultracost check ./path/to/workflow.js

Use the npm path for CI/scripting or the CLAUDE.md-injection workflow; for day-to-day use in Claude Code, the plugin above is simpler.

Uninstall

Plugin

/plugin uninstall ultracost@ultracost
/plugin marketplace remove ultracost

The plugin touches none of your own files, so removing it removes everything ultracost added.

npm CLI

ultracost uninstall

Reverses everything init did: removes the routing block from ~/.claude/CLAUDE.md, deletes ~/.claude/ultracost/, and unregisters the hook from ~/.claude/settings.json (an invalid settings.json is reported, never overwritten).

Quickstart (npm CLI)

Inside Claude Code, every verb below is also a slash command — /ultracost:estimate, /ultracost:reconcile, etc. (see the table above). This section is the npm path for CI, scripting, or the CLAUDE.md-injection workflow.

ultracost init                      # install policy + rules + hook (refuses if the plugin already delivers it)
ultracost status                    # active policy + how it's delivered (plugin/cli) + bypass caveat
ultracost audit ~/.claude/projects  # pin stats across your real workflow scripts
ultracost check ./path/to/workflow  # scan a workflow script (or a directory)
ultracost check . --fix             # auto-pin the default model on unpinned stages
ultracost estimate ./workflow.js    # agents, model mix, and cost vs all-opus baseline
ultracost explain ./workflow.js     # per-stage rationale + which checks fire
ultracost reconcile --last          # estimate vs actual for your latest real run
ultracost calibrate                 # tune the estimator from your real token usage
ultracost ledger                    # cumulative savings vs all-opus
ultracost pricing refresh           # update prices from Anthropic's official page

Point check at the script Claude wrote (its path is printed when a run starts, under ~/.claude/projects/), or wire it into CI.

Cost estimate + dynamic effort + pre-flight gate

Beyond routing, ultracost estimates a workflow's cost before it runs, has Claude pick a per-stage effort level (low to xhigh), and gates the launch so you can approve, cancel, or restructure it.

$ ultracost estimate ./workflow.js

  agents      4 fixed + 1 fan-out group(s) x ~5 = ~9
  model mix   3x opus, 6x sonnet

  baseline (all opus)   $0.9000
  tiered (ultracost)        $0.5304
  savings                   $0.3696  (41%)

Pricing is official-sourced. Prices live in policy.json with a _source URL and _asOf date; ultracost pricing refresh re-fetches Anthropic's official pricing page and updates them. The estimate itself runs offline (no network on the hot path).
Dynamic effort. Each stage gets the lowest effort that fits (low/medium/high/xhigh), bounded by model (sonnet up to high, opus up to xhigh). Effort feeds the estimate.
Pre-flight gate (on by default, hard in every mode). The plugin ships a deterministic PreToolUse hook on the Workflow tool that hard-stops every dynamic-workflow launch — it runs the guard + estimate and leads with ⚠ N/M stage(s) NOT pinned -> will inherit Opus when stages are unpinned, so an accidental all-Opus fan-out can't slip by. It is mode-aware: it asks (with the estimate) in default/acceptEdits/auto, and auto-denies an unpinned workflow in bypassPermissions/dontAsk where an ask wouldn't pause — so it holds in every permission mode, not just when the model chooses to ask. ULTRACOST_GATE=strict denies on any problem everywhere; =ask never escalates; =off disables it (headless/CI). On top of that, the policy has Claude run ultracost estimate and offer the Approve / Cancel / Modify menu via AskUserQuestion.

Estimates are relative (tiered vs all-opus), not a bill; fan-outs are ranges; the interactive 3-option menu needs a TUI. Full detail, assumptions, and the gate's #52343 limitation are in docs/ESTIMATES.md.

The closed loop: measure, reconcile, calibrate

ultracost doesn't just estimate — it reads its own results back and tunes itself. It parses your local Claude Code transcripts (offline; no network, no telemetry) and attributes tokens per dynamic-workflow stage via the subagents/workflows/wf_*/agent-*.jsonl files Claude Code writes. No other router does this.

ultracost usage                 # real token cost: main loop vs subagents vs workflow stages
ultracost reconcile --last      # estimate vs ACTUAL, per stage, for your latest workflow run
ultracost calibrate             # learn a token prior from your real runs (estimate uses it)
ultracost ledger                # cumulative $ saved vs an all-opus baseline, persisted

In Claude Code these are /ultracost:usage, /ultracost:reconcile, /ultracost:calibrate, and /ultracost:ledger — same output, no CLI install.

Self-calibrating. calibrate learns real per-stage token sizes (outlier-filtered) into ~/.claude/ultracost/calibration.json; estimate, explain, simulate, and the gate use it automatically — the estimate gets closer to your reality every run.
Savings ledger. ledger keeps a running tally of what the policy saved you versus running everything on Opus, persisted in ~/.claude/ultracost/ledger.jsonl (idempotent per run).
Pre-flight budget guard. Set budget.perRun / budget.perDay in the policy and the cost gate denies a launch whose estimate would blow the cap — before it runs.

Understand and compare a workflow

ultracost explain ./wf.js       # per-stage: tier, effort, est cost, and which UC checks fire
ultracost simulate ./wf.js      # cost under all-opus vs your tiered pins vs all-sonnet
ultracost diff old.js new.js    # cost delta between two versions (--ci → PR-comment table)

Or /ultracost:explain, /ultracost:simulate, /ultracost:diff inside Claude Code.

How routing is decided

Tier	Model	Use for
opus	`claude-opus-4-8` @ `xhigh`	writing/refactoring/debugging code, design & architecture, security/perf, tests that need judgment, planning, synthesis
sonnet	`claude-sonnet-4-6` @ `high`	applying a decided edit across files, search/grep, running tests, git ops, docs, gathering context

Decision rule: if the stage must decide how to change code → opus. If the how is already planned and it just executes → sonnet. When in doubt → opus. Never haiku.

This is opinionated and quality-first by design. If you want a cost-first split, edit the policy (below).

The Workflow Guard

$ ultracost check ./wf.js
wf.js:2:15  UC001  stage has no options object — add { model: ... } so it does not inherit the session model
wf.js:3:14  UC002  stage options object has no model — will inherit the session model
wf.js:4:13  UC003  stage pins banned model "haiku" (policy.neverUse)

3 error(s), 0 warning(s) in 1 file(s).

Code	Meaning
`UC001`	`agent(x)` with no options object
`UC002`	options object present, no `model`
`UC003`	model resolves to a banned model (e.g. haiku)
`UC004`	`model: 'inherit'` while `allowInherit` is false
`UC005`	model/options is a dynamic expression — can't verify (warning)
`UC006`	the pinned model mismatches the work the prompt describes (warning)
`UC007`	`effort` exceeds the model's cap, e.g. `sonnet` @ `xhigh` (warning)
`UC008`	an `alwaysOpus` role (orchestrator, consolidation, …) pins a cheaper tier (warning)

The scanner runs on a hand-rolled, zero-dependency JS tokenizer, so it's robust to template literals, spreads, optional-call agent?.(), and dynamic model values — and an agent( inside a prompt string or comment is prose, never a call. Fan-out detection covers .map/.flatMap/forEach/for…of/Promise.all/Array.from/pipeline. --json for CI, --fix to auto-insert the default model on the unambiguous cases (UC001/UC002), --quiet to print only the problems. UC006–UC008 are advisory warnings and never fail the build on their own; exit code is non-zero only when pin-presence errors (UC001–UC004) are found.

Audit your history

ultracost audit [dir] scans <dir>/**/workflows/scripts/*.js (default ~/.claude/projects) and reports the totals — how many stages exist and how many would silently inherit the session model:

$ ultracost audit ~/.claude/projects
ultracost audit

scanned 22 script(s) under ~/.claude/projects

  agent() stages   137
  pinned           4
  unpinned         128   (UC001/UC002 — inherit the session model)
  banned           0     (UC003)
  inherit          1     (UC004)
  dynamic          4     (UC005 — options is a variable)

  unpinned ratio   93.4%

Numbers above are illustrative; run it to see your own. --json emits the totals for dashboards or CI.

Customizing the policy

Edit ~/.claude/ultracost/policy.json, then re-run ultracost init to recompile the rules:

{
  "neverUse": ["haiku"],
  "allowInherit": false,
  "default": "opus",
  "tieBreaker": "opus",
  "tiers": {
    "opus": { "model": "opus", "effort": "xhigh" },
    "sonnet": { "model": "sonnet", "effort": "high" }
  },
  "alwaysOpus": ["orchestrator", "planner", "final-synthesis"]
}

See docs/policy.md for the full reference.

Use in CI

- run: npx ultracost check . --json

Fails the build if any committed workflow script has a stage that would inherit the session model.

How it compares

ultracost is intentionally narrow. General-purpose routers (claude-router, claude-smart-router, claude-model-changer, model-matchmaker) score every prompt and route the main loop at runtime. Linters like claudelint validate a file-based agent's model: value. ultracost targets the dynamic-workflow / ultracode path and is, as far as we can tell, the only tool that statically detects an unpinned inline agent()/pipeline() stage, flags a pin that mismatches the work the prompt describes, and reconciles its own cost estimate against real per-stage token usage. Cost tooling like ccusage, tokencast, and tokentoll informed the transcript-parsing, calibration, and cost-diff approaches (reimplemented clean-room). See NOTICE for prior-art credits.

Documentation

Showcase — a live ultracode run — policy injection → guard → cost gate → confirm, end to end, unprompted
Architecture
Policy reference
Why ultracode needs this
Testing guide — sandbox, plugin, npm, and live Claude Code CLI checks
Publishing & recognition — marketplaces, awesome lists, launch

Versioning & releases

Semantic versioning. See CHANGELOG.md. Tagged releases (vX.Y.Z) publish to npm and GitHub Releases via CI.

Configured for GitHub danielkremen818/ultracost. If you fork it, update the handle in the install commands and badges, package.json, CHANGELOG.md, and .claude-plugin/plugin.json. See docs/PUBLISHING.md for the full pre-publish checklist.

ultracost

ultracost

About

The problem

The evidence: nobody pins a stage

What ultracost does

Architecture

Install

Plugin (recommended)

npm CLI

Uninstall

Plugin

npm CLI

Quickstart (npm CLI)

Cost estimate + dynamic effort + pre-flight gate

The closed loop: measure, reconcile, calibrate

Understand and compare a workflow

How routing is decided

The Workflow Guard

Audit your history

Customizing the policy

Use in CI

How it compares

Documentation

Versioning & releases

License

Reviews (0)