agent-done-or-not
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Basarisiz
- child_process — Shell command execution capability in bin/agent-done-or-not.js
- spawnSync — Synchronous process spawning in bin/agent-done-or-not.js
- process.env — Environment variable access in bin/agent-done-or-not.js
- fs module — File system access in bin/agent-done-or-not.js
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Catch your AI agent lying about "done" A one-file proof-gate for Claude Code, Cursor & Codex. it must verify before it can finish
agent-done-or-not
Your AI agent just said "Done ✅" — but did it verify? This one-file gate
forces it to record fresh evidence before it can declare success. Works
with Claude Code, Cursor, and Codex. Copy. Paste. Ship.
agent: All tests pass — task complete! ✅
stop-gate: BLOCKED — no NEW passing check since your last completion — re-verify this change
agent: $ bash done-gate.sh capture --label test -- npm test
✗ 1 test failed (exit=1)
stop-gate: BLOCKED — your most recent check FAILED (exit=1) — fix it, don't ship it
That's the whole point: "it works" stops being a confidence claim and becomes
a receipt.
▶ Watch the terminal cast: docs/demo.cast — play locally withasciinema play docs/demo.cast.
Why
AI coding agents routinely announce a task is "done" without running anything to
back it up. It's the loudest complaint in the agent ecosystem
(claude-code#42796),
and the fix is simple: don't let the agent say done — make it show done.
agent-done-or-not records every check as a tamper-evident receipt (command +
exit code + SHA-256 of the output) and blocks the agent from finishing until the
most recent check is a fresh, passing one. Because the capture step exits
with the command's own code, a failing check can't be dressed up as success.
Why not just put "always run the tests" in CLAUDE.md?
a rule in CLAUDE.md |
agent-done-or-not | |
|---|---|---|
| Agent can ignore it | yes — it's a suggestion | no — the Stop hook blocks the turn |
| Proof it actually ran | none | hashed receipt (proof.json) |
| Failing check caught | only if the agent admits it | always — exit code is recorded |
| Works in CI / pre-commit | no | yes — done-gate.sh assert |
| Cross-tool | per-tool prose | one engine for Claude, Cursor, Codex |
| Dependencies | — | none (bash + git + sha) |
Install (60 seconds)
From the repo you want to protect:
npx agent-done-or-not init --yes
Prefer to inspect first? Use npx agent-done-or-not init --dry-run, or use the
manual two-file install in examples/install.md.
Then wire the rule + hook for your tool — see examples/install.md.
- Claude Code → drop in
CLAUDE.md+ theStophook = hard enforcement.init --claude-hook(alias--claude) also copies the gate scripts into the
repo, so the generated hook resolves instead of pointing at files that don't
exist. - Cursor → drop in
.cursorrules. - Codex / others → drop in
AGENTS.md.
Agent Skill
npx skills add mohamedzhioua/agent-done-or-not
Installs the proof-of-done rule as a skill for Claude Code / Codex / other
agents. A skill-only install gives the agent instructions, not repo-root gate
scripts; use npx agent-done-or-not capture ... from the skill, or runagent-done-or-not init / the installer first to add local done-gate.*
scripts and hook config.
npm / npx
Run the gate without cloning — handy in a CI step, an npm script, or a
skill-only install:
npx agent-done-or-not capture --label test -- npm test
npx agent-done-or-not assert --label test --ttl 3600
The npm wrapper uses the bundled Bash engine when Bash is available, and falls
back to the bundled PowerShell engine on Windows.
You can also wire the Stop hook itself through npx, with no vendored scripts
needed:
npx agent-done-or-not stop-gate
Point your harness's Stop/finish hook at that command instead of a localstop-gate.sh / stop-gate.ps1 file — it pipes the hook payload on stdin and
keeps the engine (and any policy file) beside it.
Fast local smoke check for the npm wrapper:
npm run smoke
Homebrew / Scoop
Install a global agent-done-or-not launcher:
# macOS / Linux
brew install mohamedzhioua/tap/agent-done-or-not
# Windows (native PowerShell launcher)
scoop bucket add agent-done-or-not https://github.com/mohamedzhioua/scoop-bucket
scoop install agent-done-or-not
The pinned formula and manifest live in packaging/; seepackaging/README.md for publishing them to a tap/bucket.
Windows (native PowerShell — no bash needed)
done-gate.ps1 is a native port of the engine with identical behavior and an
identical receipt format. It runs on Windows PowerShell 5.1 and PowerShell 7+
with built-ins only — no bash required:
pwsh done-gate.ps1 capture --label test -- your-test-command
pwsh done-gate.ps1 assert --label test --ttl 3600
Receipts written by done-gate.ps1 and done-gate.sh are interchangeable.
GitHub Action
Use the composite action to gate a workflow job on receipts created earlier in
the same checkout:
- uses: actions/checkout@v4
- uses: mohamedzhioua/agent-done-or-not@v0
with:
mode: assert
labels: "test build"
ttl: "3600"
pr-comment: "true" # optional: upsert a sticky proof comment on the PR
Run actions/checkout first, then produce receipts earlier in the job withbash done-gate.sh capture before the action asserts them. Withpr-comment: "true" on a pull request, the action upserts a single sticky
proof comment (✅/❌ status + the gate output); it never changes the job's
pass/fail — assert still decides that.
Claude Code plugin
Install the thin Claude Code plugin wrapper:
claude plugin marketplace add mohamedzhioua/agent-done-or-not
claude plugin install agent-done-or-not
The plugin keeps the bash core canonical at the repo root and only wiresstop-gate.sh as Claude Code's Stop hook for hard enforcement. You still drop
the CLAUDE.md rule into the protected repo for the agent-facing instruction.
Use
# Run any check through the gate — it exits with the command's own code:
bash done-gate.sh capture --label test -- npm test
# Inspect the receipts:
bash done-gate.sh show
# Render a compact proof summary:
npx agent-done-or-not report --format markdown
{"label":"test","command":"npm test","exit_code":0,"sha256":"9f2c…","log":".agent-proof/…/test.log","at":"2026-06-21T19:44:00Z","epoch":1781034240,"session":""}
See examples/proof.jsonl for a full ledger sample.
Pre-commit hook
Add this to your .pre-commit-config.yaml:
repos:
- repo: https://github.com/mohamedzhioua/agent-done-or-not
rev: v0.8.0
hooks:
- id: agent-done-assert
This runs done-gate.sh assert before every commit and blocks unless a fresh
passing proof-of-done receipt exists in .agent-proof/. Pass extra options viaargs:, for example:
- id: agent-done-assert
args: [--label, test, --ttl, "3600"]
Gate your CI / pre-commit too
assert checks the ledger without running anything — perfect for a CI step or a
pre-commit hook that refuses to proceed unless the right checks passed:
# Require BOTH a passing test and build receipt, no older than 1h,
# and make sure the "test" receipt really came from your test runner:
bash done-gate.sh assert --label test --label build \
--allow-command-regex '(npm|pnpm) test' --ttl 3600
# Machine-readable for Actions / tooling:
bash done-gate.sh assert --json --label test
# {"ok":true,"run":"…","ttl":3600,"checks":[{"label":"test","ok":true,…}]}
Every command supports --json for stable, dependency-free output.
Required checks (policy)
A rule in CLAUDE.md makes the agent run something. A policy makes it run
the right things. Drop an agent-done.json at your repo root:
{
"required": [
{ "label": "test", "command_regex": "(npm|pnpm|yarn) (run )?test|pytest" },
{ "label": "build", "command_regex": "(npm|pnpm) run build" }
],
"ttl": 3600
}
Now assert (with no --label) requires a fresh, passing receipt for every
listed label and checks that each was produced by a command matching itscommand_regex — so true or echo ok can't satisfy a test requirement:
bash done-gate.sh assert # reads agent-done.json automatically
bash done-gate.sh assert --json # adds a "policy" field
Resolution order is explicit --label → policy file → most-recent receipt;
pass --no-policy to force the legacy path, or --policy <file> to point
elsewhere. Receipts captured in separate runs still count (policy mode searches
all runs per label). Scaffold one from your detected stack withnpx agent-done-or-not init --policy. The format is documented inpolicy.schema.json.
Wrong-check warning. Labels carry a strength taxonomy (strong: test,build, typecheck, e2e, smoke…; weak: lint, format, manual…). If the
only passing evidence is a weak check, assert and report print an advisorylatest proof is lint-only — this may not verify the requested behavior. It's a
nudge, never a blocker — the exit code is unchanged.
Share the proof
Turn a ledger into something pasteable. report leads with a human card; thepr format is a sticky, marker-wrapped comment for pull requests:
npx agent-done-or-not report # card + table
npx agent-done-or-not report --format pr # paste into a PR / issue
### ✅ Proof of Done
| | |
|---|---|
| **Status** | PASS |
| **Latest** | `npm test` · exit 0 · 2m ago |
**Checks**
- ✅ `test` — `npm test` — exit `0` — 2m ago — `sha256:9f2c…`
In CI, the GitHub Action can post this as a sticky PR comment —
set pr-comment: "true".
How it works
done-gate.sh captureruns your check, streams its output, and appends a
receipt to.agent-proof/<run>/ledger.jsonl— including the gitcommit
(full HEAD SHA),tree, anddirtystate at capture time (empty/false
outside a git repo). It exits with the command's own exit code.stop-gate.shis a Stop-event hook. It blocks the agent from ending its
turn unless the most recent receipt is:- passing (a red check can never mean "done"),
- fresh — judged by the epoch recorded inside the receipt (not file
mtime, whichtouchcould forge); older thanAGENT_DONE_TTL(default 1h)
is rejected, so it never honors yesterday's ledger, - not already used to clear a previous stop (every completion needs its
own proof), and - policy-satisfying — if
agent-done.jsonexists, the gate requires a
fresh passing receipt for every required label, not just the latest
receipt of any label (parity withassert). It fails closed if the
policy can't be evaluated.
- State drift is flagged, not silently ignored.
assert,report, and
the Stop gate compare the receipt's recorded commit/tree/dirty state to the
current one. A mismatch — the classic "green receipt from
before your last edit" or stale-CI-cache gap — prints an advisory warning by
default; setAGENT_DONE_BIND_STATE=1to turn that warning into a hard
failure/block.
It fails closed. Once a stop is being gated, any missing, empty, unparseable,
or stale proof state blocks. The only ways past are a verified passing receipt,
the escape hatch, or an anti-infinite-loop safety valve (it gives up and warns
loudly after AGENT_DONE_MAX_RETRIES consecutive blocks so the agent can never
get permanently stuck).
Dependency-free: portable bash + git + one of sha256sum/shasum/python.
No network, no LLM, no config file. The receipt format is documented inproof.schema.json. Hardened with an independent
cross-model (Codex) security review — see CONTRIBUTING.md.
How it can — and can't — be fooled (threat model)
agent-done-or-not is built to resist an agent that wants to look done. It is
a forcing function, not a sandbox; here's the honest boundary.
Precisely: the SHA-256 is proof of exactly what the recorded command printed.
The receipt as a whole is a verification receipt — evidence that the
configured check ran and passed against a specific commit — not a proof of
semantic correctness or that the task is actually finished. You still have to
pick a command that verifies what you claim.
It stops these:
- Claiming "done" with no check run → blocked (no receipt).
- A failing check dressed up as success → blocked (
capturerecords and exits
with the real non-zero code). - Re-using an old green run → blocked (receipts are consumed; freshness uses the
epoch inside the receipt, sotouchwon't refresh it). - Malformed/empty proof state to slip through → blocked (the gate fails
closed). - An infinite block loop → bounded safety valve (
AGENT_DONE_MAX_RETRIES). - Running the wrong class of check → constrain it with an
agent-done.json
policy (per-labelcommand_regex) orassert --allow-command-regex. A policy
that is present but unparseable fails closed — it never silently degrades
to the most-recent receipt.
It does NOT claim to stop these (out of scope by design):
- Choosing a weak check (an empty test suite "passes"). You pick the command;
pair it with--allow-command-regexand real tests. - An agent that rewrites the ledger files by hand. If your agent can freely edit
.agent-proof/it can forge anything — treat the ledger as you would any
workspace file. (verify --shalets a second party confirm a specific hash.) - Hard enforcement on harnesses without a stop hook (Cursor): there the rule is
advisory, but every receipt is still recorded for you and CI to audit. - Async / fire-and-forget work. A receipt captures the command's exit state
at the moment it exits — not background work that finishes later. If the
real check is asynchronous, use a command that blocks until it's done (poll,
wait, or a synchronous health check); otherwise don't claim done from it. - A stale green from before your last edit. This is why receipts are now
bound to the git commit/tree at capture time (see How it
works) — a receipt captured against old code, including a
cached CI green, no longer looks identical to a fresh one. SetAGENT_DONE_BIND_STATE=1to make that drift a hard failure instead of a
warning.
Report a bypass — see SECURITY.md. It's the most valuable issue
you can file.
Escape hatch
export AGENT_DONE_OFF=1 # disable the gate
FAQ
Does it need Node / Python / jq? No. Just bash, git, and one ofsha256sum / shasum / python for hashing.
Windows? Fully supported natively. done-gate.ps1 and stop-gate.ps1
are native PowerShell ports (PS 5.1 + PS 7+, no bash required). Receipts
are interchangeable between the bash and PowerShell engines. CI (test.yml)
runs the PowerShell parity suite on both Windows PowerShell 5.1 and
PowerShell 7+ on every push.
What does AGENT_DONE_BIND_STATE=1 do? Turns the advisory git-state-drift
warning (receipt commit/tree/dirty doesn't match the current one) into a hard
failure for assert and a hard block for the Stop gate, and requires the receipt
to carry a commit binding at all. Off by default so it doesn't break off-VCS or
detached-HEAD usage. It defends against honest staleness — a stale CI cache
or a green from before your last edit — not a tampered ledger: an agent that can
write the ledger can also write a matching commit, which is the same trust
boundary as forging exit_code:0 (see the threat model).
Won't it get my agent stuck? No — after AGENT_DONE_MAX_RETRIES consecutive
blocks it fails open with a loud warning, and AGENT_DONE_OFF=1 disables it.
Is the receipt private? Yes — .agent-proof/ is local and gitignored; it's
never committed. Keeping it ignored also matters for the git-state check above:
if .agent-proof/ weren't ignored, writing a receipt would dirty the tree it's
supposed to be describing.
What's the difference between capture and assert? capture runs a
check and records proof (use it in the agent loop). assert checks the ledger
without running anything (use it in CI / pre-commit).
Credits
Extracted and sharpened from the evidence/verify-gate layer of
zhioua-os, an AI engineering OS for coding
agents ("replace trust with evidence"). MIT licensed — fork it, ship it.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi