shipcheck

agent
Guvenlik Denetimi
Basarisiz
Health Gecti
  • License รขโ‚ฌโ€ License: MIT
  • Description รขโ‚ฌโ€ Repository has a description
  • Active repo รขโ‚ฌโ€ Last push 0 days ago
  • Community trust รขโ‚ฌโ€ 10 GitHub stars
Code Basarisiz
  • rm -rf รขโ‚ฌโ€ Recursive force deletion command in examples/demo-shop/setup.sh
  • network request รขโ‚ฌโ€ Outbound network request in examples/demo-shop/setup.sh
  • rm -rf รขโ‚ฌโ€ Recursive force deletion command in install.sh
Permissions Gecti
  • Permissions รขโ‚ฌโ€ No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

๐Ÿšฆ Senior-QA review skill for Claude Code: runs a regression hunter + a change reviewer in parallel, then gives one read-only SIGN OFF / DO NOT SHIP verdict with file:line evidence.

README.md

๐Ÿšฆ shipcheck

A senior-QA review skill for Claude Code that runs two QA engineers in parallel and gives you one ship / no-ship verdict.

Before you merge, shipcheck dispatches a regression hunter ("does this break anything that already worked?") and a change reviewer ("does this actually do what it claims โ€” including the edge cases?") at the same time, then consolidates everything into a single, evidence-backed verdict.

SIGN OFF ยท SIGN OFF WITH NITS ยท DO NOT SHIP

Install ยท Try it ยท How it works ยท Usage ยท Why it's different ยท FAQ

read-only cost-aware license works with


What is this?

shipcheck is a single Claude Code Skill. You install it once and then, in any repo, you say:

"shipcheck this branch before I merge"

Claude acts as a QA lead: it scopes and sizes your change, then fans out the right amount of review โ€” two specialist reviewers in parallel for a real diff (a lighter single pass for a trivial one) โ€” and hands you back one consolidated verdict with file:line evidence for every claim. The review a careful senior QA engineer would give you.

It is 100% read-only. It never edits, commits, runs migrations, or touches external services. It reads code and runs read-only checks (typecheck, lint, your test suite, git diff).

Why it's different

Most "review my code" prompts are one model doing a shallow lint. shipcheck is built like a real QA function:

  • Two independent reviewers, in parallel โ€” not one generalist. A regression specialist and a change-correctness specialist see different bugs. Running them concurrently means breadth for free (โ‰ˆ one pass of wall-clock time).
  • Adversarial, evidence-first โ€” each reviewer must prove a bug with a file:line and a concrete trigger scenario. No file:line + trigger โ‡’ it's a guess and gets dropped, not shipped to you as noise.
  • Regressions are first-class โ€” it spends most of its effort on the code you didn't change that depends on what you did. That's where production breakage actually comes from.
  • New-vs-baseline aware โ€” pre-existing lint/test failures aren't blamed on your diff.
  • Context-isolated + cost-aware โ€” the heavy methodology lives in reference files that the subagents read on demand, so a deep review never bloats your main Claude session. It sizes the diff first: cheap model + a single pass for small changes, parallel reviewers only when the change warrants it. (See Cost & modes.)
  • Honest about coverage โ€” every report includes "verified safe" (what it actually confirmed OK) and "residual risks / manual tests" (what a human still needs to click), so a PASS it couldn't confirm is labeled NEEDS-MANUAL-CHECK, never faked.

Install

One-liner (user scope โ€” available in every project)

git clone https://github.com/lohani-mohit/shipcheck.git && ./shipcheck/install.sh

Manual

Copy the skill into your Claude Code skills directory:

# user scope (all projects)
cp -r shipcheck/skills/shipcheck ~/.claude/skills/

# or project scope (this repo only)
cp -r shipcheck/skills/shipcheck .claude/skills/

That's it โ€” Claude Code auto-discovers skills in those folders. Restart your Claude Code session (or run /skills) and shipcheck is available.

Requirements: Claude Code with Skills enabled. Works in any git repo. gh CLI optional (used only when you point it at a PR number).

Try it in 60 seconds

Don't have a risky diff handy? The repo ships with a runnable example whose
feature branch hides two real defects โ€” a dropped-429 regression and an
unclamped discount that can go negative:

./examples/demo-shop/setup.sh
cd demo-shop-sandbox
claude            # then ask:  shipcheck this branch against main

shipcheck should come back with DO NOT SHIP and point at src/http/client.js:3
(the regression) and src/cart/discount.js:5 (the unclamped percent). Full
walkthrough in examples/README.md.

Usage

Just ask, in natural language โ€” Claude triggers the skill automatically:

shipcheck my working changes before I commit
shipcheck this branch against main
is this safe to ship?
QA the diff and give me a sign-off
shipcheck PR #482

Or point it at a specific base / files:

shipcheck the staged changes against origin/develop
shipcheck src/auth/ โ€” focus on the token-refresh change

What you get back

Verdict: DO NOT SHIP โ€” a refactor silently drops the retry on 429s.

Blockers (Critical/High)
  โ€ข src/http/client.ts:88 โ€” retryOn() no longer includes 429. Any caller
    relying on rate-limit retries (src/sync/poller.ts:210) now fails hard on
    the first 429 instead of backing off.  Fix: re-add 429 to RETRYABLE.

Findings (Medium/Low)
  โ€ข src/ui/Toast.tsx:34 โ€” new error toast has no auto-dismiss; trace shows it
    persists until manual close. Likely unintended.

Verified safe
  โ€ข formatPrice() byte-identical to base across all branches.
  โ€ข All 7 callers of parseConfig() updated to pass the new `strict` arg.

Residual risks / recommended manual tests
  โ€ข Could not exercise the OAuth redirect end-to-end โ€” manually confirm login
    still completes against the staging IdP.

How it works

            you: "shipcheck this branch"
                        โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚   QA lead (you in  โ”‚   1. resolves base ref + changed files
              โ”‚   the main thread) โ”‚      (lightweight โ€” no deep reading)
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚  launches both IN PARALLEL (one message)
            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
            โ–ผ                         โ–ผ
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Reviewer A         โ”‚   โ”‚ Reviewer B          โ”‚
  โ”‚ Regression hunter  โ”‚   โ”‚ Change reviewer     โ”‚
  โ”‚ "what did this     โ”‚   โ”‚ "does it do what it โ”‚
  โ”‚  break?"           โ”‚   โ”‚  claims? edge cases"โ”‚
  โ”‚ reads              โ”‚   โ”‚ reads               โ”‚
  โ”‚ regression-hunt.md โ”‚   โ”‚ change-review.md    โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚  consolidate:      โ”‚   merge, dedupe, reconcile,
              โ”‚  ONE verdict       โ”‚   rank โ†’ SIGN OFF / DO NOT SHIP
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The skill is structured for progressive disclosure: SKILL.md is a short
orchestrator; the deep, opinionated QA methodology lives in
references/regression-hunt.md and references/change-review.md, which only the
subagents load. Your main session never pays the token cost of the full
playbook.

skills/shipcheck/
โ”œโ”€โ”€ SKILL.md                      # lean orchestrator (the only thing your main context loads)
โ””โ”€โ”€ references/
    โ”œโ”€โ”€ regression-hunt.md        # Reviewer A's full playbook (loaded by the subagent)
    โ””โ”€โ”€ change-review.md          # Reviewer B's full playbook (loaded by the subagent)

Cost & modes

Straight talk: a thorough senior-QA review reads code and traces callers, and
that costs tokens โ€” a real review of a normal diff is not "free." shipcheck's
job is to spend the right amount for the risk, and to keep that spend out of
your main session. It does three things:

  • Sizes the diff first and picks a mode:
    Mode When Reviewers Model
    Lite trivial / low-risk (โ‰ฒ50 lines, no core/auth/migration files) 1, single pass Haiku / Sonnet
    Full normal changes 2 in parallel Sonnet
    Deep large / high-risk, or you flag it 2 in parallel Opus
  • Defaults to Sonnet, never Opus for the reviewers (the single biggest cost lever). Ask for "deep review" to escalate.
  • Disciplined reading โ€” reviewers work from git diff, read spans not whole files, skip lockfiles/node_modules, and cap themselves at ~12โ€“15 tool calls so a review can't run away.

The context isolation is the honest win: the heavy playbooks load inside the
sub-agents, so a deep review never bloats your main Claude session. You pay for
depth when you want depth โ€” and you can dial it down for small changes.

FAQ

Will it change my code? No. It is strictly read-only โ€” reads, git diff/show, typecheck, lint, and your test suite. It never edits, commits, or hits write APIs.

Does it work without gh? Yes. gh is only used if you point it at a PR number. Branch/working-tree/staged reviews use plain git.

Is it expensive? A thorough review costs tokens โ€” that's real work, not spin. shipcheck keeps it proportional: it sizes the diff and runs a cheap single pass for small changes, parallel reviewers (on Sonnet by default, not Opus) for real ones, with disciplined reading and a tool-call cap. And because reviewers run in isolated sub-agents, that spend never bloats your main session. See Cost & modes. Want maximum depth? Ask for a "deep review" to escalate to Opus.

How is this different from the built-in /review or a code-review skill? Those are typically a single generalist pass. shipcheck is two adversarial specialists in parallel with an explicit regression focus and a structured ship/no-ship gate โ€” modeled on how a real senior QA engineer signs off a change.

Can I run just one reviewer? Yes โ€” ask for "just the regression check" or "just verify the change" and Claude will launch the relevant one.

Contributing

Issues and PRs welcome โ€” especially additional reviewer playbooks (security, performance, accessibility) you'd want fanned out in parallel. Keep SKILL.md lean; put depth in references/.

License

MIT ยฉ Mohit Lohani

If shipcheck caught a bug before it reached prod, consider giving it a โญ.

Yorumlar (0)

Sonuc bulunamadi