keel-harness

24-hour Claude Code sessions that don't drift, forget, or fake-pass tests.

· 🇺🇸 English · 🇨🇳 中文

Quickstart · What's Inside · Architecture · Docs

Same fresh Claude Code session, with vs without keel — Act 1 Without (lost context, leaked secret, fake test pass) → Act 2 With (handoff carried, secret blocked, claim verified)

Without keel: new session, blank slate. The agent forgets last sprint, re-decides settled questions, commits a stray API key, and tells you the tests pass without running them.

With keel: new session, same momentum. 9 hooks fire on every tool call. A 7-field handoff carries context across the restart. Secrets get blocked at commit time. "Done" requires the command output to prove it.

# After installing superpowers + PUA (see Required Dependencies below):
git clone https://github.com/mr-shaper/keel-harness && cd keel-harness && bash install.sh

A hook framework + handoff schema + audit gate. Apache-2.0. macOS + Linux. Built on top of superpowers and PUA.

30-Second Elevator Pitch

Long Claude Code sessions decay. The agent forgets last sprint's decisions,
re-debates settled questions, leaks the occasional secret into commits, and
claims tests pass without running them. keel-harness is the infrastructure
underneath — a 7-field handoff schema that survives session restart, 9 hooks
that fire on every tool call, a 6-dimension audit gate, and a 4-layer agent
topology that keeps the tech-lead AI from quietly slipping into IC mode. Same
model, same prompts — bounded by hooks, anchored to evidence. Apache-2.0,
macOS + Linux.

The 4 Gaps It Fills

4 Gaps + Solutions concept map

Most teams hit these walls within weeks of using Claude Code for serious engineering:

Gap 1 — 24h session memory loss: Every new session, the AI forgets what it decided last
sprint. harness fixes this with an immutable 7-field handoff schema — written at session stop,
read at session start, enforced by hooks. Zero context drift.
Gap 2 — Paper victory (Romeo audit blind spots): AI claims "done" when hooks are registered
but never fire. The Romeo 6-dimensional audit framework (Honesty / Ownership / TechDepth /
PatternReplay / Density / Candidates) enforces a hardcore ≥0.99 bar across 6 independent
dimensions — not a single-axis pass/fail.
Gap 3 — P9 role drift: Your tech lead AI starts writing code instead of writing prompts.
The P10-9-8-7 topology with 8 iron rules hard-separates strategy (P10), task-prompt writing
(P9), implementation (P8), and sub-tasks (P7) — and enforces it via pre-tool hooks that block
role violations before they happen.
Gap 4 — Silent dead hooks: Hooks appear registered in settings.json but never trigger
because the Layer 0 contract (CLAUDE.md + settings.json) is incomplete or inconsistent. harness
ships a Layer 0 enforcement spec — 5 elements that must all be present or the system silently
dies — plus templates you can fill in and ship.

Required Dependencies (install BEFORE running install.sh)

keel-harness is the kernel + workflow MDs. The runtime protocols its
workflow MDs reference live in two upstream OSS plugins. Both are MIT-licensed
and Apache-2.0 compatible. harness will not function without these.

install.sh Phase 0.5 detects both and ABORTS if either is missing.

1. superpowers (MIT, Jesse Vincent / @obra)

Provides: writing-plans, dispatching-parallel-agents, test-driven-development,
verification-before-completion, brainstorming, executing-plans,
subagent-driven-development.

claude plugin marketplace add obra/superpowers-marketplace
claude plugin install superpowers@superpowers-marketplace

Repo: https://github.com/obra/superpowers · License: MIT · Version tested: 5.0.7

2. PUA (MIT, TanWei Security Lab / @tanweai)

Provides: P10/P9/P8/P7 role protocols, red-line enforcement, Romeo evaluator,
parallel agent topology, performance pressure escalation.

git clone https://github.com/tanweai/pua ~/.claude/plugins/pua

Repo: https://github.com/tanweai/pua · License: MIT · Version tested: 3.0.0

Verification

ls ~/.claude/plugins/pua/plugin.json   # PUA installed
ls -d ~/.claude/plugins/cache/superpowers-marketplace 2>/dev/null \
  || ls -d ~/.claude/plugins/marketplaces/superpowers-marketplace

If either path is missing, bash install.sh will exit 2 with install
instructions printed inline. Use --skip-deps-check ONLY for
development/dogfood.

Bundled Plugins (auto-installed by install.sh)

These ship inside plugins/ and get copied to ~/.claude/plugins/<name>/
during Phase 1.5. No separate download.

Plugin	License	What it provides
OODC v1.4.0	Apache-2.0 (by mr-shaper)	Cognitive loop: Observe → Orient → Decide → Create. 4 reference protocols. Used by `workflows/oodc-superpower-harness-orchestration.md`
compound-selfcheck v0.1.0	Apache-2.0 (by mr-shaper)	PostToolUse soft-prompt: when a Write/Edit produces > 100 LOC or > 5KB, emits a stderr reminder to ingest the change into a knowledge base (Compound Engineering, not one-shot). Audit-logs `[COMPOUND-CHECK]` entries to `.harness/hook-trace.log` for "real-trigger vs performance" detection. Soft-prompt only — never blocks.

Quickstart (5 min)

Prerequisites: superpowers + PUA installed (see Required Dependencies above).

curl -fsSL https://raw.githubusercontent.com/mr-shaper/keel-harness/main/install.sh | bash

Until you have the one-liner cached, manual bootstrap:

# Step 1: Clone the kernel
git clone https://github.com/mr-shaper/keel-harness.git ~/.claude/plugins/keel-harness-mp

# Step 2: Apply Layer 0 contract templates
cp ~/.claude/plugins/keel-harness-mp/templates/CLAUDE.md.global.template ~/.claude/CLAUDE.md
# Edit ~/.claude/CLAUDE.md — fill in the <PLACEHOLDER> fields for your context

# Step 3: Merge hooks into settings.json (requires jq)
jq -s '.[0] * .[1]' \
  ~/.claude/settings.json \
  ~/.claude/plugins/keel-harness-mp/templates/settings.json.template \
  > /tmp/settings-merged.json && mv /tmp/settings-merged.json ~/.claude/settings.json
# Restart Claude Code — harness hooks are now active

After install, start your first harnessed session:

1. Read .harness/handoff-S<N-1>-to-S<N>.md   — previous session's authoritative next_action
2. Answer 5 self-checks (Q1 project / Q2 next_action / Q3 clarity / Q4 handoff name / Q5 week)
3. Work — Stop hook writes the next handoff automatically

Standard Plan-Authoring Prompt (copy-paste this for any non-trivial task)

When you give the agent a task that has 3+ steps, multi-file changes, or
crosses module boundaries, paste this prompt verbatim. It binds the agent
to the harness execution contract and prevents the most common failure
mode (skipping workflow reads, which causes the UserPromptSubmit hook to
emit warnings mid-conversation).

For this task, design and execute the plan under the four-layer nested
parallel topology of harness:

  Harness  ⊃  OODC  ⊃  PUA P10-9-8-7  ⊃  Superpower Pipeline

The plan itself must read as a guide that future executing agents follow
under the same topology — annotate each Wave / Phase with which layer
drives it (which OODC step, which role tier, which Pipeline phase).

Before drafting the plan, READ these five workflow MDs (skipping them
triggers a UserPromptSubmit warning that interrupts the conversation):

  1. workflows/pua-topology.md
  2. workflows/oodc-superpower-harness-orchestration.md
  3. workflows/superpower-pipeline.md
  4. workflows/skill-loading-sop.md
  5. workflows/kb-ingestion-sop.md

Then verify any Skill you intend to invoke is REALLY loaded
(skill-loading-sop §5 — five dimensions: tool-call, references body Read,
protocol applied, sub-agent injection, evidence-aligned self-eval).
A Skill listed in inventory is not the same as a Skill actually loaded.

Wave / Phase tracking is mandatory: every Wave and every Phase in the
plan MUST have a TaskCreate entry. The TaskCreate list IS the Superpower
Pipeline stage tracker — update statuses as you progress
(pending → in_progress → completed). No Wave without a task entry.

Once the plan is drafted with topology annotations, ratification gates,
and TaskCreate entries, present it for approval before execution.

The full version of this contract — failure modes, role definitions,
skill verification protocol — lives in
docs/agent-execution-standard.md.

For more user-facing prompts (project bootstrap, sprint kickoff, sprint
close, etc.), see docs/quickstart-prompts.md.

What's Inside (Kernel Scope)

The kernel is the minimum viable harness — no private configuration, no personal plugins,
no company-specific logic. Everything that ships is universally applicable to any Claude Code
power user.

Workflow Documentation (5 files)

File	What it encodes
`workflows/pua-topology.md`	P10-9-8-7 nested parallel topology + 8 iron rules
`workflows/oodc-superpower-harness-orchestration.md`	OODC loop (Observe → Orient → Decide → Create) orchestration across Harness + Superpower + PUA layers
`workflows/superpower-pipeline.md`	Phase 0-4 engineering pipeline (kickoff → parallel explore → decision convergence → dev → close)
`workflows/skill-loading-sop.md`	Skill discovery + loading SOP — prevents hallucinated tool calls
`workflows/kb-ingestion-sop.md`	Knowledge base ingestion pipeline — Compound Engineering, not one-shot generation

Hooks (9 enforce-core hooks)

Hook	Type	What it enforces
`stop-handoff-writer.sh`	Stop	Writes 7-field handoff at every session end
`pre-tool-handoff-read-gate.sh`	PreToolUse	Blocks file writes until handoff is read (sticky flag)
`pre-tool-handoff-semantic-gate.sh`	PreToolUse	Semantic check — prevents writing wrong session's handoff
`user-prompt-l42-workflow-trigger-gate.sh`	UserPromptSubmit	Routes trigger words (harness/OODC/PUA/Superpower) to the correct workflow MD
`pre-tool-doc-sync-sop-enforce.sh`	PreToolUse	Enforces doc-sync routing before any knowledge base write
`post-tool-chmod-ci-gate.sh`	PostToolUse	chmod guard — prevents CI scripts from losing execute bit silently
`session-start-layer0-health.sh`	SessionStart	Layer 0 health check — verifies all 5 contract elements are present
`pre-tool-plan-quality-gate.sh`	PreToolUse	Blocks low-quality plan writes (Romeo ≥0.99 gate)

Templates

templates/handoff-template.md — 7-field handoff schema (sprint / next_action / blockers / decisions / files_changed / self_check / romeo_score)
templates/cat-h-rule-template.md — Category H canonical law template (for adding new ratified rules)
templates/CLAUDE.md.global.template — Generic global Claude Code contract (~180 LOC, scrubbed of personal config)
templates/CLAUDE.md.project.template — Generic project contract (~50 LOC, 5-must-reads + 5-self-checks + bible principles placeholder)
templates/settings.json.template — Generic settings.json with 9 hooks registered (~125 LOC, 8 enforce-core + 1 compound-selfcheck plugin hook)

Audit Framework

audit/romeo-6-dim-framework.md — Romeo 6-dimensional audit spec (Honesty / Ownership / TechDepth / PatternReplay / Density / Candidates), ≥0.99 hardcore gate, evidence-alignment rules
docs/sprint-kickoff-checklist.md — Five-layer GATE self-check (Layer A entity / B content / C gate / D config / E behavior fire). Mandatory at every sprint kickoff to prevent score inflation

Tooling

sync.sh — 5-command sync (init / export / import / diff / release) with 5-layer privacy protection
scripts/sync-self-check.sh — Cross-platform 5-layer evidence dump. Read-only by design: maintainer reads the dump and self-evaluates sprint outcome (the script never decides outcome itself — P9-doesn't-decide-L4 pattern)
manifest.json — Kernel file whitelist + private blacklist keywords (what stays in, what never ships)
install.sh — One-line bootstrap (ships W3)
CHANGELOG.md — Keep a Changelog format, semver tags
LICENSE — Apache-2.0

Demos (asciinema → agg-rendered GIFs)

Three additional reproducible demos cover the gaps in motion:

#	Demo	Length	Gap
1	24h Cross-Session Continuity	3 min	Gap 1 — AI memory loss
2	4-Layer Nested Parallel — 7 P8 → 7× speedup	2 min	Gap 3 — P9 role drift
3	Canonical Honesty Hooks — 5-layer defense	2.5 min	Gap 2 — paper victory

Reproduce locally: brew install asciinema agg && bash demo/record.sh all

Architecture: 4-Layer Nested Parallel Topology

4-Layer Nested Parallel Topology infographic

═══════════════════════════════════════════════════════════════════
Harness (cross-session, weeks to months)
   │
   └─ OODC (Observe → Orient → Decide → Create, 1 major goal = 1 loop)
        │
        └─ Superpower Pipeline (Phase 0 → 1 → 2 → 3 → 4)
             │   Phase 0  kickoff (load skills + create tasks + manifest draft)
             │   Phase 1  parallel exploration (brainstorm, retro, compete scan)
             │   Phase 2  decision convergence (P10 ratifies, no more options)
             │   Phase 3  development  (N waves of true parallel P8 agents)
             │   Phase 4  close (launch / retrospective / handoff)
             │
     CEO (the human user) — ultimate authority, sits above all AI roles
       │  ratifies / overrides P10 ; final trump card on every strategic decision
       ↓
             └─ PUA P10 / P9 / P8 / P7  (all AI roles)
                  P10  = CTO (AI strategy layer) — ratifies under CEO, dispatches to P9, never writes code
                  P9   = Tech Lead — writes Task Prompts, never writes code
                  P8   = Senior Eng — same-message true parallel, owns a file domain
                  P7   = P8-spawned sub-agent — granular sub-tasks
═══════════════════════════════════════════════════════════════════

The 8 P9 Iron Rules (never violate)

P9 dispatches multiple P8s in a single message — true parallel, not sequential
P8 spawns P7 internally — P9 never manages P7 directly
P10 never writes Task Prompts, never manages P8
P9 never writes code — writing code = role drift = automatic PUA 3.5 penalty
CEO (the human user) always overrides P10 — CEO is human, P10 is the AI CTO; CEO is the ultimate authority above the entire AI hierarchy
File domain isolation — grep-verify no overlap before dispatch
Same-message multi-Agent = true parallel (not loop-sequential)
P9 runs verification commands and pastes output — no empty claims

The 5 Words We Want in the Agent Engineering Vocabulary

harness-engineering introduces 5 precise concepts that fill gaps in the current agent
engineering lexicon:

Term	Definition
Thin Watering Principle	Apply harness constraints as a thin, universal layer — never couple enforcement to private personal config. The harness should work for anyone without modification.
7-Field Handoff Schema	The minimum viable handoff: `sprint / next_action / blockers / decisions / files_changed / self_check / romeo_score`. Missing any field = the next session is flying blind.
Romeo 6-Dim Audit	Six independent dimensions — Honesty, Ownership, TechDepth, PatternReplay, Density, Candidates — each scored 0-1.00. Overall bar: average ≥0.99 hardcore. Not a checklist, a judgment framework.
Canonical Honesty Rule	Every claim requires evidence paste. "It works" without command output = 0 points. The hook system enforces this at the PreToolUse layer, before the AI can write a completion.
4-Layer Nested Parallel	Harness ⊃ OODC ⊃ Superpower Phase 0-4 ⊃ PUA P10-9-8-7. Concurrency at every layer. Not just "run agents in parallel" — structured parallelism with role separation and file domain isolation.

Documentation

Workflow MDs ship as part of the kernel. English versions land in W2:

workflows/pua-topology.md — P10-9-8-7 topology + 8 iron rules
workflows/oodc-superpower-harness-orchestration.md — OODC loop orchestration
workflows/superpower-pipeline.md — Phase 0-4 engineering pipeline
workflows/skill-loading-sop.md — Skill loading SOP
workflows/kb-ingestion-sop.md — KB ingestion + Compound Engineering

Optional Integrations (advanced)

The two REQUIRED plugins (superpowers + PUA) and the BUNDLED plugin (OODC) are
covered above. Below are three additional plugins referenced indirectly by
harness workflow MDs. They are not bundled and not auto-installed.
Review each project's license before use — Apache-2.0 compatibility is your
responsibility.

claude-mem (--with-claude-mem): Persistent semantic memory across sessions. AGPL-3.0 — strong copyleft, your responsibility to comply.
tacit-kb (--with-tacit-kb): MIT, public (github.com/mr-shaper/tacit-kb). The Compound Engineering KB that powers the harness's compound flywheel (decisions / exemplars / analogies / evolution). Works key-free with local BM25 search; an optional embedding key upgrades it to hybrid semantic search.
doc-sync (--with-docsync): Document synchronization + knowledge base ingestion routing. License unclear — verify before use.

These plugins were built for a specific engineering context. They work best
when you already understand the harness topology. Start with the
required + bundled, add these only when you feel the gap.

Compatibility

Platform	Status
macOS Sonoma 14+ (Apple Silicon + Intel)	Tested
macOS Monterey 12 / Ventura 13	Should work (bash 3.2+)
Ubuntu 22.04 LTS (x86_64)	Tested (W6 cross-platform verify)
Ubuntu 20.04 LTS	Should work
Windows (WSL2)	Untested, community welcome

Requirements: bash 3.2+ · jq · git · Claude Code CLI

Install jq if missing:

# macOS
brew install jq

# Ubuntu / Debian
sudo apt-get install -y jq

License

Apache-2.0

You are free to use, modify, and distribute this software for any purpose. The Apache-2.0 license
includes a patent grant — appropriate for infrastructure frameworks. See LICENSE for full terms.

Credits

Mitchell Hashimoto — "harness engineering" naming. The concept of a thin harness layer
that constrains and shapes a more powerful underlying system without replacing it.
Andrej Karpathy — Agentic engineering vision. The Why behind structured AI agent
engineering: systems that are reliable, auditable, and production-grade. harness is the How.
Jesse Vincent — superpowers plugin architecture. The Skill system that makes harness
workflow MDs composable and discoverable.

Contributing

Issues and PRs welcome. Before opening a PR:

Read workflows/pua-topology.md — understand the P8 file domain isolation rule
Every claim in the PR description needs evidence (command output, test results)
New hooks: must pass Layer 0 health check + add a test in tests/
New workflow MDs: must follow the 7-field handoff schema and Romeo audit format

If you find a use case the kernel doesn't cover, open an issue before building — the kernel
scope is intentionally narrow. Scope creep is the enemy of a reusable harness.

"The goal is not to make AI smarter. The goal is to make AI reliable."