🚦 shipcheck

A senior-QA review skill for Claude Code that runs two QA engineers in parallel and gives you one ship / no-ship verdict.

Before you merge, shipcheck dispatches a regression hunter ("does this break anything that already worked?") and a change reviewer ("does this actually do what it claims — including the edge cases?") at the same time, then consolidates everything into a single, evidence-backed verdict.

SIGN OFF · SIGN OFF WITH NITS · DO NOT SHIP

Install · Try it · How it works · Usage · Why it's different · FAQ

What is this?

shipcheck is a single Claude Code Skill. You install it once and then, in any repo, you say:

"shipcheck this branch before I merge"

Claude acts as a QA lead: it scopes and sizes your change, then fans out the right amount of review — two specialist reviewers in parallel for a real diff (a lighter single pass for a trivial one) — and hands you back one consolidated verdict with file:line evidence for every claim. The review a careful senior QA engineer would give you.

It is 100% read-only. It never edits, commits, runs migrations, or touches external services. It reads code and runs read-only checks (typecheck, lint, your test suite, git diff).

Why it's different

Most "review my code" prompts are one model doing a shallow lint. shipcheck is built like a real QA function:

Two independent reviewers, in parallel — not one generalist. A regression specialist and a change-correctness specialist see different bugs. Running them concurrently means breadth for free (≈ one pass of wall-clock time).
Adversarial, evidence-first — each reviewer must prove a bug with a file:line and a concrete trigger scenario. No file:line + trigger ⇒ it's a guess and gets dropped, not shipped to you as noise.
Regressions are first-class — it spends most of its effort on the code you didn't change that depends on what you did. That's where production breakage actually comes from.
New-vs-baseline aware — pre-existing lint/test failures aren't blamed on your diff.
Context-isolated + cost-aware — the heavy methodology lives in reference files that the subagents read on demand, so a deep review never bloats your main Claude session. It sizes the diff first: cheap model + a single pass for small changes, parallel reviewers only when the change warrants it. (See Cost & modes.)
Honest about coverage — every report includes "verified safe" (what it actually confirmed OK) and "residual risks / manual tests" (what a human still needs to click), so a PASS it couldn't confirm is labeled NEEDS-MANUAL-CHECK, never faked.

Install

One-liner (user scope — available in every project)

git clone https://github.com/lohani-mohit/shipcheck.git && ./shipcheck/install.sh

Manual

Copy the skill into your Claude Code skills directory:

# user scope (all projects)
cp -r shipcheck/skills/shipcheck ~/.claude/skills/

# or project scope (this repo only)
cp -r shipcheck/skills/shipcheck .claude/skills/

That's it — Claude Code auto-discovers skills in those folders. Restart your Claude Code session (or run /skills) and shipcheck is available.

Requirements: Claude Code with Skills enabled. Works in any git repo. gh CLI optional (used only when you point it at a PR number).

Try it in 60 seconds

Don't have a risky diff handy? The repo ships with a runnable example whose
feature branch hides two real defects — a dropped-429 regression and an
unclamped discount that can go negative:

./examples/demo-shop/setup.sh
cd demo-shop-sandbox
claude            # then ask:  shipcheck this branch against main

shipcheck should come back with DO NOT SHIP and point at src/http/client.js:3
(the regression) and src/cart/discount.js:5 (the unclamped percent). Full
walkthrough in examples/README.md.

Usage

Just ask, in natural language — Claude triggers the skill automatically:

shipcheck my working changes before I commit
shipcheck this branch against main
is this safe to ship?
QA the diff and give me a sign-off
shipcheck PR #482

Or point it at a specific base / files:

shipcheck the staged changes against origin/develop
shipcheck src/auth/ — focus on the token-refresh change

What you get back

Verdict: DO NOT SHIP — a refactor silently drops the retry on 429s.

Blockers (Critical/High)
  • src/http/client.ts:88 — retryOn() no longer includes 429. Any caller
    relying on rate-limit retries (src/sync/poller.ts:210) now fails hard on
    the first 429 instead of backing off.  Fix: re-add 429 to RETRYABLE.

Findings (Medium/Low)
  • src/ui/Toast.tsx:34 — new error toast has no auto-dismiss; trace shows it
    persists until manual close. Likely unintended.

Verified safe
  • formatPrice() byte-identical to base across all branches.
  • All 7 callers of parseConfig() updated to pass the new `strict` arg.

Residual risks / recommended manual tests
  • Could not exercise the OAuth redirect end-to-end — manually confirm login
    still completes against the staging IdP.

How it works

            you: "shipcheck this branch"
                        │
              ┌─────────▼─────────┐
              │   QA lead (you in  │   1. resolves base ref + changed files
              │   the main thread) │      (lightweight — no deep reading)
              └─────────┬─────────┘
                        │  launches both IN PARALLEL (one message)
            ┌───────────┴────────────┐
            ▼                         ▼
  ┌───────────────────┐   ┌────────────────────┐
  │ Reviewer A         │   │ Reviewer B          │
  │ Regression hunter  │   │ Change reviewer     │
  │ "what did this     │   │ "does it do what it │
  │  break?"           │   │  claims? edge cases"│
  │ reads              │   │ reads               │
  │ regression-hunt.md │   │ change-review.md    │
  └─────────┬─────────┘   └──────────┬──────────┘
            └───────────┬────────────┘
                        ▼
              ┌───────────────────┐
              │  consolidate:      │   merge, dedupe, reconcile,
              │  ONE verdict       │   rank → SIGN OFF / DO NOT SHIP
              └───────────────────┘

The skill is structured for progressive disclosure: SKILL.md is a short
orchestrator; the deep, opinionated QA methodology lives in
references/regression-hunt.md and references/change-review.md, which only the
subagents load. Your main session never pays the token cost of the full
playbook.

skills/shipcheck/
├── SKILL.md                      # lean orchestrator (the only thing your main context loads)
└── references/
    ├── regression-hunt.md        # Reviewer A's full playbook (loaded by the subagent)
    └── change-review.md          # Reviewer B's full playbook (loaded by the subagent)

Cost & modes

Straight talk: a thorough senior-QA review reads code and traces callers, and
that costs tokens — a real review of a normal diff is not "free." shipcheck's
job is to spend the right amount for the risk, and to keep that spend out of
your main session. It does three things:

Sizes the diff first and picks a mode:

Mode	When	Reviewers	Model
Lite	trivial / low-risk (≲50 lines, no core/auth/migration files)	1, single pass	Haiku / Sonnet
Full	normal changes	2 in parallel	Sonnet
Deep	large / high-risk, or you flag it	2 in parallel	Opus

Defaults to Sonnet, never Opus for the reviewers (the single biggest cost lever). Ask for "deep review" to escalate.
Disciplined reading — reviewers work from git diff, read spans not whole files, skip lockfiles/node_modules, and cap themselves at ~12–15 tool calls so a review can't run away.

The context isolation is the honest win: the heavy playbooks load inside the
sub-agents, so a deep review never bloats your main Claude session. You pay for
depth when you want depth — and you can dial it down for small changes.

FAQ

Will it change my code? No. It is strictly read-only — reads, git diff/show, typecheck, lint, and your test suite. It never edits, commits, or hits write APIs.

Does it work without gh? Yes. gh is only used if you point it at a PR number. Branch/working-tree/staged reviews use plain git.

Is it expensive? A thorough review costs tokens — that's real work, not spin. shipcheck keeps it proportional: it sizes the diff and runs a cheap single pass for small changes, parallel reviewers (on Sonnet by default, not Opus) for real ones, with disciplined reading and a tool-call cap. And because reviewers run in isolated sub-agents, that spend never bloats your main session. See Cost & modes. Want maximum depth? Ask for a "deep review" to escalate to Opus.

How is this different from the built-in /review or a code-review skill? Those are typically a single generalist pass. shipcheck is two adversarial specialists in parallel with an explicit regression focus and a structured ship/no-ship gate — modeled on how a real senior QA engineer signs off a change.

Can I run just one reviewer? Yes — ask for "just the regression check" or "just verify the change" and Claude will launch the relevant one.

Contributing

Issues and PRs welcome — especially additional reviewer playbooks (security, performance, accessibility) you'd want fanned out in parallel. Keep SKILL.md lean; put depth in references/.

License

_{If shipcheck caught a bug before it reached prod, consider giving it a ⭐.}