shipcheck
Health Pass
- License รขโฌโ License: MIT
- Description รขโฌโ Repository has a description
- Active repo รขโฌโ Last push 0 days ago
- Community trust รขโฌโ 10 GitHub stars
Code Fail
- rm -rf รขโฌโ Recursive force deletion command in examples/demo-shop/setup.sh
- network request รขโฌโ Outbound network request in examples/demo-shop/setup.sh
- rm -rf รขโฌโ Recursive force deletion command in install.sh
Permissions Pass
- Permissions รขโฌโ No dangerous permissions requested
No AI report is available for this listing yet.
๐ฆ Senior-QA review skill for Claude Code: runs a regression hunter + a change reviewer in parallel, then gives one read-only SIGN OFF / DO NOT SHIP verdict with file:line evidence.
๐ฆ shipcheck
A senior-QA review skill for Claude Code that runs two QA engineers in parallel and gives you one ship / no-ship verdict.
Before you merge, shipcheck dispatches a regression hunter ("does this break anything that already worked?") and a change reviewer ("does this actually do what it claims โ including the edge cases?") at the same time, then consolidates everything into a single, evidence-backed verdict.
SIGN OFF ยท SIGN OFF WITH NITS ยท DO NOT SHIP
Install ยท Try it ยท How it works ยท Usage ยท Why it's different ยท FAQ
What is this?
shipcheck is a single Claude Code Skill. You install it once and then, in any repo, you say:
"shipcheck this branch before I merge"
Claude acts as a QA lead: it scopes and sizes your change, then fans out the right amount of review โ two specialist reviewers in parallel for a real diff (a lighter single pass for a trivial one) โ and hands you back one consolidated verdict with file:line evidence for every claim. The review a careful senior QA engineer would give you.
It is 100% read-only. It never edits, commits, runs migrations, or touches external services. It reads code and runs read-only checks (typecheck, lint, your test suite, git diff).
Why it's different
Most "review my code" prompts are one model doing a shallow lint. shipcheck is built like a real QA function:
- Two independent reviewers, in parallel โ not one generalist. A regression specialist and a change-correctness specialist see different bugs. Running them concurrently means breadth for free (โ one pass of wall-clock time).
- Adversarial, evidence-first โ each reviewer must prove a bug with a
file:lineand a concrete trigger scenario. Nofile:line+ trigger โ it's a guess and gets dropped, not shipped to you as noise. - Regressions are first-class โ it spends most of its effort on the code you didn't change that depends on what you did. That's where production breakage actually comes from.
- New-vs-baseline aware โ pre-existing lint/test failures aren't blamed on your diff.
- Context-isolated + cost-aware โ the heavy methodology lives in reference files that the subagents read on demand, so a deep review never bloats your main Claude session. It sizes the diff first: cheap model + a single pass for small changes, parallel reviewers only when the change warrants it. (See Cost & modes.)
- Honest about coverage โ every report includes "verified safe" (what it actually confirmed OK) and "residual risks / manual tests" (what a human still needs to click), so a PASS it couldn't confirm is labeled
NEEDS-MANUAL-CHECK, never faked.
Install
One-liner (user scope โ available in every project)
git clone https://github.com/lohani-mohit/shipcheck.git && ./shipcheck/install.sh
Manual
Copy the skill into your Claude Code skills directory:
# user scope (all projects)
cp -r shipcheck/skills/shipcheck ~/.claude/skills/
# or project scope (this repo only)
cp -r shipcheck/skills/shipcheck .claude/skills/
That's it โ Claude Code auto-discovers skills in those folders. Restart your Claude Code session (or run /skills) and shipcheck is available.
Requirements: Claude Code with Skills enabled. Works in any git repo.
ghCLI optional (used only when you point it at a PR number).
Try it in 60 seconds
Don't have a risky diff handy? The repo ships with a runnable example whose
feature branch hides two real defects โ a dropped-429 regression and an
unclamped discount that can go negative:
./examples/demo-shop/setup.sh
cd demo-shop-sandbox
claude # then ask: shipcheck this branch against main
shipcheck should come back with DO NOT SHIP and point at src/http/client.js:3
(the regression) and src/cart/discount.js:5 (the unclamped percent). Full
walkthrough in examples/README.md.
Usage
Just ask, in natural language โ Claude triggers the skill automatically:
shipcheck my working changes before I commit
shipcheck this branch against main
is this safe to ship?
QA the diff and give me a sign-off
shipcheck PR #482
Or point it at a specific base / files:
shipcheck the staged changes against origin/develop
shipcheck src/auth/ โ focus on the token-refresh change
What you get back
Verdict: DO NOT SHIP โ a refactor silently drops the retry on 429s.
Blockers (Critical/High)
โข src/http/client.ts:88 โ retryOn() no longer includes 429. Any caller
relying on rate-limit retries (src/sync/poller.ts:210) now fails hard on
the first 429 instead of backing off. Fix: re-add 429 to RETRYABLE.
Findings (Medium/Low)
โข src/ui/Toast.tsx:34 โ new error toast has no auto-dismiss; trace shows it
persists until manual close. Likely unintended.
Verified safe
โข formatPrice() byte-identical to base across all branches.
โข All 7 callers of parseConfig() updated to pass the new `strict` arg.
Residual risks / recommended manual tests
โข Could not exercise the OAuth redirect end-to-end โ manually confirm login
still completes against the staging IdP.
How it works
you: "shipcheck this branch"
โ
โโโโโโโโโโโผโโโโโโโโโโ
โ QA lead (you in โ 1. resolves base ref + changed files
โ the main thread) โ (lightweight โ no deep reading)
โโโโโโโโโโโฌโโโโโโโโโโ
โ launches both IN PARALLEL (one message)
โโโโโโโโโโโโโดโโโโโโโโโโโโโ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ
โ Reviewer A โ โ Reviewer B โ
โ Regression hunter โ โ Change reviewer โ
โ "what did this โ โ "does it do what it โ
โ break?" โ โ claims? edge cases"โ
โ reads โ โ reads โ
โ regression-hunt.md โ โ change-review.md โ
โโโโโโโโโโโฌโโโโโโโโโโ โโโโโโโโโโโโฌโโโโโโโโโโโ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโโโ
โ consolidate: โ merge, dedupe, reconcile,
โ ONE verdict โ rank โ SIGN OFF / DO NOT SHIP
โโโโโโโโโโโโโโโโโโโโโ
The skill is structured for progressive disclosure: SKILL.md is a short
orchestrator; the deep, opinionated QA methodology lives inreferences/regression-hunt.md and references/change-review.md, which only the
subagents load. Your main session never pays the token cost of the full
playbook.
skills/shipcheck/
โโโ SKILL.md # lean orchestrator (the only thing your main context loads)
โโโ references/
โโโ regression-hunt.md # Reviewer A's full playbook (loaded by the subagent)
โโโ change-review.md # Reviewer B's full playbook (loaded by the subagent)
Cost & modes
Straight talk: a thorough senior-QA review reads code and traces callers, and
that costs tokens โ a real review of a normal diff is not "free." shipcheck's
job is to spend the right amount for the risk, and to keep that spend out of
your main session. It does three things:
- Sizes the diff first and picks a mode:
Mode When Reviewers Model Lite trivial / low-risk (โฒ50 lines, no core/auth/migration files) 1, single pass Haiku / Sonnet Full normal changes 2 in parallel Sonnet Deep large / high-risk, or you flag it 2 in parallel Opus - Defaults to Sonnet, never Opus for the reviewers (the single biggest cost lever). Ask for "deep review" to escalate.
- Disciplined reading โ reviewers work from
git diff, read spans not whole files, skip lockfiles/node_modules, and cap themselves at ~12โ15 tool calls so a review can't run away.
The context isolation is the honest win: the heavy playbooks load inside the
sub-agents, so a deep review never bloats your main Claude session. You pay for
depth when you want depth โ and you can dial it down for small changes.
FAQ
Will it change my code? No. It is strictly read-only โ reads, git diff/show, typecheck, lint, and your test suite. It never edits, commits, or hits write APIs.
Does it work without gh? Yes. gh is only used if you point it at a PR number. Branch/working-tree/staged reviews use plain git.
Is it expensive? A thorough review costs tokens โ that's real work, not spin. shipcheck keeps it proportional: it sizes the diff and runs a cheap single pass for small changes, parallel reviewers (on Sonnet by default, not Opus) for real ones, with disciplined reading and a tool-call cap. And because reviewers run in isolated sub-agents, that spend never bloats your main session. See Cost & modes. Want maximum depth? Ask for a "deep review" to escalate to Opus.
How is this different from the built-in /review or a code-review skill? Those are typically a single generalist pass. shipcheck is two adversarial specialists in parallel with an explicit regression focus and a structured ship/no-ship gate โ modeled on how a real senior QA engineer signs off a change.
Can I run just one reviewer? Yes โ ask for "just the regression check" or "just verify the change" and Claude will launch the relevant one.
Contributing
Issues and PRs welcome โ especially additional reviewer playbooks (security, performance, accessibility) you'd want fanned out in parallel. Keep SKILL.md lean; put depth in references/.
License
MIT ยฉ Mohit Lohani
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found