claude-execution-harness
Health Uyari
- License — License: NOASSERTION
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Gecti
- Code scan — Scanned 5 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Claude Code autopilot: one command → build → test → localhost ready. No mid-run interruptions.
claude-execution-harness
Claude Code autopilot: give it a plan, come back to a working localhost.
Most Claude Code sessions look like this:
You: implement feature X
Claude: done — review security logic? [y/N]
You: y
Claude: done — approve this approach? [y/N]
You: y
Claude: one more thing — is this the right pattern? [y/N]
...
This skill eliminates every mid-run interruption. One command, it builds, tests,
runs security review in the background, and hands you a localhost URL to verify.
You review once, at the end.
/execution-harness
Before / After
| Before | After |
|---|---|
| Interrupted every security task | Security review deferred — review-ledger.md at end |
| No enforcement on file size | 300-LOC hook blocks oversize writes in master + subagents |
| Context lost after compaction | plan.dag.json resumes deterministically |
Ends at git commit |
Ends at localhost:PORT — click to verify |
| Re-discovers same bugs each run | agentdb remembers gotchas across runs |
Quick demo
$ /execution-harness
[harness] Reading plan: docs/plans/auth-refactor.md
[harness] Classified 5 tasks → plan.dag.json written
[harness] Recalled 2 gotchas from previous runs
[task-001] security-core: add tenant isolation to /api/users — spawning Opus...
[task-001] done. Adversarial verifier: NEEDS_REVIEW (1 finding → review-ledger.md)
[task-002] business: add export endpoint — spawning Sonnet...
[task-002] done. Tests: 4 passed.
[task-003] mechanical-fan: add error return types (12 files) — spawning Haiku...
[task-003] done. 12 files patched.
[task-004] bugfix: fix TestHandleGetItem_Found — spawning Sonnet...
[task-004] done. Tests: all green.
[task-005] FE-ops: update status page — spawning Sonnet...
[task-005] done.
[harness] Simplify pass: no over-engineering detected.
[harness] Starting local preview...
[preview] BE ready at http://localhost:54321
[preview] FE ready at http://localhost:54322
==============================
PREVIEW READY
FE: http://localhost:54322 ← verify here
BE: http://localhost:54321
Smoke: PASS
==============================
[harness] run-report-2026-06-17.md written.
[harness] review-ledger.md: 1 item needs human review before merge.
Install
Prerequisites:
# 1. ECC — typed subagent specialists (the "muscle")
claude /plugin install affaan-m/ECC
# 2. Superpowers — skill routing
claude /plugin install obra/superpowers
Install the harness:
git clone https://github.com/freshdigital-it/claude-execution-harness
cd claude-execution-harness
bash setup.sh
Wire the 300-LOC hook (once per project, in .claude/settings.json):
{
"hooks": {
"PreToolUse": [{
"matcher": "Write|Edit",
"hooks": [{ "type": "command",
"command": "~/.claude/skills/execution-harness/scripts/hooks/pretooluse-filesize.sh"
}]
}]
}
}
How it works
Tasks are classified once at plan-time — no mid-run decisions:
| Class | Model | TDD | Review |
|---|---|---|---|
security-core |
Opus | test-first | adversarial verifier → review-ledger.md |
business / bugfix |
Sonnet | test-first | auto gate |
mechanical-fan |
Haiku | none | pipeline (bulk) |
FE-ops / refactor |
Sonnet | none | auto gate |
Each task runs in an ephemeral subagent. The master holds only the DAG and
checkpoint log — subagents return bounded summaries, never raw output.
Security without interruption: security-core tasks get an independent adversarial
verifier (different model, mandatory negative tests). If it finds an issue, it's logged
to review-ledger.md — you read the ledger once at the end, not once per task.
Resume after any interruption: plan.dag.json tracks every task's status.
Restart the session, run /execution-harness again — it skips done tasks automatically.
Key features
300-LOC enforcement — PreToolUse hook blocks oversized writes in the master session
and in all spawned subagents. Verified empirically.Local preview isolation — each run gets its own port and throwaway database.
Two parallel runs never collide.Cross-run memory —
agentdb_pattern_storerecords gotchas at run-end.
Next run's plan-time recalls them so the same mistake isn't made twice.Decision-ledger reconciliation — plan-time checks for conflicts with past
architectural decisions (viacode-review-graphimpact radius + keyword match).
Conflicts are surfaced before the first task, not mid-run.Simplify pass — after the loop, one agent checks for over-engineering in the
diff (dead code, single-use abstractions). Safe removals applied automatically.
What's in this repo
skills/execution-harness/
SKILL.md — invoked by /execution-harness
reference/
lifecycle.md — full phase sequence + local-preview isolation
autonomy.md — budget ceiling, failure-breaker, adversarial verifier
standing-constraints.md — 300-LOC, 30-line methods, TDD-by-class, memory
schemas.md — plan.dag.json, review-ledger, decision-ledger schemas
verification.md — empirical test procedures + acceptance test results
scripts/
local-preview.sh — isolated BE+FE local instance (auto port, throwaway DB)
check_file_sizes.sh — 300-LOC gate for CI / pre-commit
deploy-lock.sh — explicit deploy gate (never called automatically)
hooks/pretooluse-filesize.sh — PreToolUse hook
rules/
clean-architecture.md — file/method size, single responsibility, type safety
behavioral.md — think-before-coding, simplicity-first, surgical changes
docs/design-decisions.md — full reasoning behind every architectural choice
CLAUDE.md.template — copy to your project root and customize
setup.sh — one-command install
Built on the shoulders of
This project synthesizes ideas from several open-source projects and research.
No code was copied — only patterns were adapted. Full credits: ATTRIBUTION.md.
| Project | What was borrowed |
|---|---|
| obra/superpowers | Skill routing pattern |
| affaan-m/ECC | Typed subagent catalog ("the muscle") |
| nousresearch/hermes-agent | Episodic memory + trajectory capture |
| DietrichGebert/ponytail | YAGNI decision ladder |
| SWE-agent, OpenHands, Voyager, Reflexion | ACI output design, sandbox isolation, skill library, reflect-on-fail |
License
MIT — see LICENSE.
ECC and Superpowers have their own licenses; this repo contains only the harness layer.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi