stuntman
Health Pass
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 10 GitHub stars
Code Pass
- Code scan — Scanned 6 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Claude doesn't do its own stunts. Plan with Claude, execute with a near-free model, review with Claude — a delegation harness for Claude Code.
🎬 stuntman
Claude doesn't do its own stunts. · Website
Claude Code plans the scene and reviews the take. A near-free model — DeepSeek,
Gemini, Groq, a local Ollama, whatever — takes the hits. Your expensive
subscription tokens go only where intelligence actually matters.

PLAN EXECUTE REVIEW ITERATE
Claude (sub) ──▶ stunt double ──▶ Claude (sub) ──▶ feedback ↩
reads the code, headless Claude reads the diff, same worker
writes a spec Code instance, runs the tests session resumes,
with zero open any model via itself — trusts fixes in place.
decisions a local proxy nothing max 2 rounds,
then Claude
takes over
Commands
From a normal (subscription) Claude Code session, in any project:
| Command | What it does | Spans |
|---|---|---|
/delegate <task> |
Claude plans + reviews; a near-free model executes the spec through a local proxy. | cost |
/relay <task> |
Keeps the cheap worker going across Claude's 5-hour usage limit; Claude resumes at reset. | the rate limit |
/scaffold |
Stands up a project's self-resuming memory — a CLAUDE.md contract + four living docs (HANDOFF · STATUS · SPEC · STRATEGY). |
the context boundary |
/handoff |
Reads those docs and continues exactly where the last session stopped. | new sessions / /clear |
/wiki |
Builds a "second brain" across a folder of projects — an Obsidian vault + a graphify knowledge graph + a live MCP for cross-project recall. | every project |
/launch |
Fans out a multi-agent workflow — cited competitor research, market sizing, channel ranking, then pricing + positioning + a week-by-week launch playbook, pressure-tested by adversarial critics — into one Product Success Overview (markdown + HTML). | the blank-page launch |
Each is detailed in its own section below.
The pain point
You pay for a Claude subscription. And then you watch Opus burn through your
usage limits typing boilerplate — CRUD endpoints, test scaffolding,
mechanical refactors. Work a model that costs fractions of a cent does just
fine.
The obvious fix — "just use a cheaper model" — fails, because cheap models
are bad at exactly the two things that matter: deciding what to build and
judging whether it's right. Hand DeepSeek a vague task and you get
confident garbage. Hand it a spec with every decision already made, and it
executes beautifully.
So split the loop:
- Claude thinks. It explores your codebase and writes a spec with zero
design decisions left open. This is where the expensive tokens earn their keep. - A free model types. A second, headless Claude Code instance — same
harness, same tools, different brain — executes the spec autonomously. - Claude checks. It reads the diff, runs the tests itself, and sends
review feedback back into the same worker session until it passes.
No API credits. The orchestrator runs on the subscription you already pay
for; the worker runs through your own keys on a near-free backend (several
have free tiers).
The trick
Claude Code respects ANTHROPIC_BASE_URL. Point it at a local
Anthropic-compatible proxy (free-claude-code,
17 backends supported) and you get a fully functional Claude Code whose brain
is any model you want. Run that headlessly (claude -p --output-format json)
and your main Claude session can spawn it, parse its results, and drive it
through review rounds (--resume <session_id>) — like a senior engineer
managing a very fast, very cheap contractor.
The worker gets its own CLAUDE_CONFIG_DIR, so it never touches your
subscription's login or session state.
Field report
First real run, on a real codebase: Claude picked an open item off the
project's backlog, explored the code, and wrote the spec. DeepSeek executed a
3-file change (new feature flag wired through two modules, plus a test) in
65 seconds. Claude's review found zero issues, the full test suite passed,
lint clean. Total Anthropic tokens spent on implementation: zero.
The benchmark: Opus DIY vs. the stunt double
Same task, two paths: build a landing page for this repo (dark cinematic
design, 4-step workflow section, stats, install commands — a real frontend
brief). Path A: Opus does everything itself. Path B: Opus writes the spec,
DeepSeek executes, Opus reviews. Identical brief, measured identically from--output-format json usage.
| Opus DIY | stuntman | |
|---|---|---|
| Claude output tokens | 50,955 | 9,737 (8,782 plan + 955 review) |
| API-equivalent Claude cost | ~$4.92 | ~$0.65 |
| Wall-clock | 18 min — never finished | ~3 min (63s execution) |
| Review verdict | n/a (it was the builder) | PASS, zero iterations |
| Result | the slightly nicer page | ~90% of the page, $0 on implementation |
That "never finished" is real: after building the page in ~7 minutes, Opus
spent 11 more minutes verifying its own work in a browser — and deadlocked
clicking its own copy button (navigator.clipboard.writeText blocks forever
in a permissionless headless browser). The page was already done. That
self-verification spiral is exactly the expensive-model behavior you're
paying for by the token — and exactly what the harness moves off your bill.
5.2× the Claude tokens, 7.6× the cost, 6× the wall-clock — for a margin
best described as taste. And the punchline: the Opus page was the better
artifact, so it's this repo's actual website
— built by the benchmark that proves you usually don't need it.
| DeepSeek (63s, $0 Claude tokens) | Opus (18 min, ~$4.92) |
|---|---|
![]() |
![]() |
(Screenshots taken with reveal animations force-disabled; the gray wash on
DeepSeek's hero is the screenshot hack blowing its 4%-opacity film grain to
100% — the real page is clean.)
Install
Prerequisites:
Claude Code with a subscription (the orchestrator).
A worker backend — pick one:
Route A — Claude Code worker via free-claude-code
(same agentic harness as the orchestrator, any of fcc's 17 backends):uv tool install free-claude-code fcc-init # pick your backend + paste your key (DeepSeek, OpenRouter, Groq, Ollama…) fcc-server # leave it running (localhost:8082)Route B — opencode worker (no proxy at all — opencode talks
to DeepSeek/Groq/Ollama and 75+ providers natively):brew install sst/tap/opencode opencode auth login # or export a provider key, e.g. DEEPSEEK_API_KEY export STUNTMAN_WORKER=opencode export STUNTMAN_MODEL=deepseek/deepseek-v4-flash # any provider/model opencode knows
Option A — Claude Code plugin (recommended)
Inside Claude Code:
/plugin marketplace add mhlaghari/stuntman
/plugin install stuntman@stuntman
Option B — plain install script
git clone https://github.com/mhlaghari/stuntman && cd stuntman && ./install.sh
Copies the /delegate, /relay, /scaffold, /handoff, /wiki, and /launch
skills to ~/.claude/skills/ and the stunt worker, window probe, scaffold,
and wiki tools to ~/.local/bin/.
Usage
From your normal (subscription) Claude Code session, in any project:
/delegate add input validation to the upload endpoint
What happens:
- Claude explores the code and writes a self-contained spec (you'll see it).
- The stunt double executes it headlessly — file edits, running tests, the lot.
- Claude diffs the work against the spec and runs verification itself.
- Problems go back to the same worker session as review feedback. After two
failed rounds, Claude declares the task "heavy" and finishes it personally —
which is exactly the work you bought the subscription for.
Choosing your stunt double
Two knobs: STUNTMAN_WORKER picks the backend (claude via the local proxy —
the default — or opencode), STUNTMAN_MODEL pins the model:
# Route A (proxy): any id from the proxy's /v1/models
export STUNTMAN_MODEL="anthropic/deepseek/deepseek-v4-flash"
# Route B (opencode): provider/model, no proxy required
export STUNTMAN_WORKER=opencode
export STUNTMAN_MODEL="deepseek/deepseek-v4-flash"
Route A's worker is a full headless Claude Code (same tools and agentic loop
as the orchestrator). Route B trades that harness fidelity for zero proxy
setup — opencode authenticates to providers directly.
Good stunt doubles, roughly in order of bang-per-buck:
| Backend | Why |
|---|---|
| DeepSeek | Strong coder, absurdly cheap |
| Groq / Cerebras | Fast open models, free tiers |
| NVIDIA NIM | Free tier, solid open models |
| Gemini Flash | Cheap, large context |
| Ollama / LM Studio | Literally free, fully local |
Working across the 5-hour limit
Long autonomous runs eventually hit Claude's 5-hour usage window. /relay
spans it automatically. It reads your window — how much is used and the exact
reset time — from the same endpoint /usage uses, at zero token cost (no
Claude call, safe to poll even while you're capped). Then:
- While you have headroom, Claude does its normal plan/review work.
- When the window caps, the stunt double keeps going — it bills your own
near-free key, not Anthropic, so the 5-hour limit never touches it — and a
handoff is saved to.stuntman/relay-state.json. - When the window reopens, the Claude side resumes: hands-free if the reset
is under ~55 min away (it schedules its own wakeup), or on your next ping for
longer gaps (a scheduled wakeup is clamped to one hour, and Claude can't wake
itself mid-blackout).
/loop relay this backlog across the limit
The probe is useful on its own, too — window prints your live 5-hour and
weekly utilization plus reset times as one JSON line.
Scaffolding project memory
/delegate saves cost and /relay survives the rate limit; /scaffold and/handoff make a project survive the context boundary — clearing
context, or starting fresh tomorrow.
/scaffold — run once. It writes a contract into your CLAUDE.md (a "read
this first" list + a "before you stop" process contract) and creates the docs it
references — four living documents — then fills them in from your project:
HANDOFF.md— the session baton: what changed, the next step, the gotchas.STATUS.md— the board: built / in progress / planned.SPEC.md— the contract: what this is, the load-bearing principles, where it's going.STRATEGY.md— the honest why / direction.
Each self-declares as a living doc with a changelog. It's idempotent and never
clobbers existing content. The contract then keeps them current — each session
refreshes the docs (and README.md, when the surface changes) before stopping.
/handoff (or just say "execute handoff") — run at the start of any
session. It reads HANDOFF.md, STATUS.md, README.md, and whatever else the
contract lists, then picks up exactly where the last session left off — zero
re-explaining, even in a brand-new session after clearing context.
The mechanism: CLAUDE.md auto-loads every session, so the read-first /
update-before-stopping contract is always in context. A Stop hook (plugin
install) backs it up — in scaffolded projects only, it nudges once if you changed
code but didn't update the docs. Together the commands make long autonomous runs
cheap, rate-limit-proof, and context-proof.
A second brain across all your projects
/delegate, /relay, /scaffold work inside one project. /wiki works
across them: run it in a folder and it builds a navigable second brain — an
Obsidian vault of notes-about-projects plus a
graphify knowledge graph — so a future
session can ask "did I already solve this?" and find the answer in another project.
cd ~/code # a folder of projects (or a single project)
/wiki
One shot, auto-detecting scope:
- Scaffolds the vault at
<folder>-wiki/— aCLAUDE.mdnote-schema,wiki/{projects,concepts,patterns,lessons-learned}, anindex.mdcatalog, a
Map of Content, and an.obsidian/config (graph pre-colored by status). - Writes a note per project from its README/code — what it is, its stack, the
load-bearing decisions and lessons — with selective[[wikilinks]]between
related projects. - Builds a graphify graph (interactive
graph.html, queryablegraph.json, an
Obsidiangraph.canvas) that clusters projects into families and surfaces
cross-project connections you'd never think to look for. - Wires the graphify MCP at user scope, so Claude can query the brain live in
every future session — cross-project recall becomes automatic.
Idempotent (safe to re-run as projects evolve) and notes only — your project
code is never touched.
From a blank page to a launch plan
/delegate ships the product; /launch figures out how to sell it. Run it in
the product's repo and it fans out a multi-agent workflow that does the
market homework you'd otherwise pay a consultant for:
cd ~/code/my-product
/launch
It reads your repo for grounding, asks the four decisions that shape a
go-to-market (beachhead · monetization · timeline · resources), then:
- Researches every competitor with cited web sources — current pricing,
funding, traction, the real controversies — plus market sizing, the
regulatory/trust tailwinds, and ranked launch channels. - Drafts the strategy in parallel — an honest product assessment, a pricing
model anchored to the competitor matrix, positioning + messaging, and a
week-by-week launch playbook (Product Hunt / Show HN / the specific
communities to post in, and who to talk to). - Pressure-tests it with two adversarial critics (feasibility + market
reality) and folds their objections back into the plan — so it pushes back
on a weak go-to-market instead of flattering it. - Compiles one Product Success Overview — a
LAUNCH_PLAN.mdplus a styled
HTML report — answering: is the product good, what's the strategy, how do I
launch it, do I need beta testers, and how do I price it?
A token-heavy run (it does real research — ~15–25 agents), so it's for a real
launch decision, not a quick look.
Why the spec quality matters
This is the one non-obvious lesson: the harness works because the spec
leaves the worker zero design decisions. Exact files, exact signatures,
exact test to add, exact verification commands. Weak models executing great
instructions beat strong models executing vague ones — and writing great
instructions is precisely what your expensive model is for.
Security note
The worker runs with --dangerously-skip-permissions so it can edit files
and run tests unattended. That means it can execute arbitrary commands in
your project directory. Only delegate inside repos you trust, and prefer
delegating from a clean branch — review then covers exactly the worker's
changes, and a bad take is one git checkout away from the cutting-room floor.
FAQ
Does this burn my Anthropic credits?
No. The worker talks to your local proxy with your own backend keys. Your
subscription pays only for the plan and review phases — the thinking.
Is the worker really Claude Code?
Yes — same agentic harness, tools, and editing loop. Only the model behind/v1/messages changes. That's why it can be driven with -p, --resume,
and --output-format json like any other Claude Code instance.
What if the worker produces garbage?
That's the point of the review phase. Claude trusts nothing: it reads the
diff and runs the tests itself. Garbage gets caught, fed back, and fixed —
or Claude takes over.
Can I use a different proxy?
Anything that speaks the Anthropic Messages API works. Edit bin/stunt and
point ANTHROPIC_BASE_URL wherever you like.
Do I need the proxy at all?
Not with the opencode backend (STUNTMAN_WORKER=opencode) — opencode talks
to DeepSeek, Groq, Ollama, and 75+ providers natively. The proxy route's
advantage is that the worker is a full headless Claude Code instance (same
tools and editing loop as the orchestrator).
Does the window probe cost tokens or eat my limit?
No. It reads the same OAuth usage endpoint /usage reads, with the credentials
Claude Code already stores. No /v1/messages call — so it's free to poll, even
while you're rate-limited.
Credits
- free-claude-code by
Alishahryar1 — the proxy that makes the trick possible. - Built with (and by) Claude Code.
License
MIT
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found

