Forge your Claude Code projects — pass them through, they come out faster.
A practitioner's meta-harness: it raises each project's floor (harness-ify the setup)
and ceiling (accelerate the work), then compounds the gains across your whole portfolio.

Quality is the lever; speed is the result. Every change earns its way through the gates —
adversarial · phantom · regression — and that is what makes the next change faster.

Fork it. Rename it. Make it yours.

FORK · ADAPT · COLLABORATE · EMPOWER

The principles · Why it exists · The evidence · How to use it

If you're here because…	forge-harness solves it
Context disappears when a session ends	Persistent `tracks/` — resumable from anywhere
You repeat the same setup across projects	Connect once to the hub, share across all projects
Team AI know-how lives only in people's heads	Codify it so everyone shares it
You want AI to get better as work accumulates	Skills and patterns compound session over session
You need a governance layer for AI-generated code	`fh-gate` wraps any coding agent as a post-generation gate

This document is for humans. AI operating rules → CLAUDE.md · Command reference → CHEATSHEET.md

Get started in 2 minutes

Prerequisite: Claude Code CLI — verify with claude --version

# 1. Install the plugin
claude plugin marketplace add https://github.com/chrono-meta/forge-harness.git
claude plugin install -s user fh-meta@forge-harness

# 2. Clone the hub
git clone https://github.com/chrono-meta/forge-harness.git ~/forge-harness
cd ~/forge-harness

# 3. Start a session
claude

✅ Claude reads CLAUDE.md and asks what project to connect or what task to start.
Say "Connect a project" → hub scans ../, finds .git directories, creates tracks/{project}/.

Plugin only (no clone):

claude plugin marketplace add https://github.com/chrono-meta/forge-harness.git  # once
claude plugin install -s user fh-meta@forge-harness
cd ~/projects/{your-project} && claude

⚠️ Plugin-only is partial synergy. You get the skills and agents, but not Layer 1 — the
CLAUDE.md governance (active onboarding, the 4-axis gate, mode branching) and the compounding
context (tracks/ memory accumulation, harvest-loop learning). Each skill runs the same in
isolation; what's missing is the orchestration that makes them compound across sessions. Clone the
hub (above) when you want the full set, not just the tools.

🚪 New here / just want the skills? Start with the opinionated front door —
templates/starter_profile.md: one install command, a curated
first-five skills, and a zero-install governance gate (npx … fh-gate). The other 28 skills wait
until you need them.

What it is

forge-harness is structured as two distinct layers:

Layer	Contents	AI compatibility
Methodology layer	`tracks/`, `knowledge/`, `SKILL.md` docs, session protocols	Any AI model
Automation layer	`.claude/agents/`, hooks, slash commands, `CLAUDE.md` rules	Claude Code only

The methodology layer is the portable core — persistent hub, accumulating learnings, curating cross-project knowledge. The automation layer makes it frictionless when running Claude Code.

Where this sits (2026): "harness engineering" is now a public paradigm — and basic agent
orchestration is rapidly commoditizing into standard infrastructure. FH deliberately stakes nothing on
that plumbing. Its durable layer is what does not commoditize: the governance gates (adversarial ·
phantom · regression), drift control, and the cross-project compounding loop. Routing and dispatch are
means; the gate and the loop are the asset.

forge-harness/   ← the hub (persistent brain)
├── knowledge/   → shared across all projects
└── tracks/      → work records per project

Project A  ──→  connect hub in CLAUDE.md
Project B  ──→  connect hub in CLAUDE.md

Why it works

After a long co-authoring session with your AI, you and it share the same context — and the same blind spots. The reviewer worth having is the one who never saw your reasoning. You can get that by hand: paste the work into a fresh, empty chat. FH just turns that chore into one routine command.

sidecar / agent dispatch → a reviewer with none of your session's context
steel-quench · phantom-quench → that cold pass, on demand

It's model-agnostic: co-build with one AI, run the cold pass with any other. Whoever was absent from the original session is your cold reviewer — this is not a ranking of models.

What FH does not claim: the cold pass is your base model's own ability, not a detection engine FH adds — a plain prompt to a fresh instance does much of the same. FH's value is narrower and honest: it takes a method drawn from real practice and makes running that independent pass routine, instead of a chore you skip. The methodology is copyable; what FH packages is the workflow, not a secret sauce.

Governance layer for AI-generated code

FH wraps any coding agent (OpenCode, Codex, etc.) as a post-generation governance gate.

npx --package @chrono-meta/fh-gate fh-gate                    # default: Claude backend
FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-gate   # Codex backend
FH_BACKEND=auto npx --package @chrono-meta/fh-gate fh-gate "src/foo.ts" full
# → FH_GATE_VERDICT: PASS | PENDING | BLOCKED | ESCALATE

fh-gate uses the same FH governance prompt for both runtimes. FH_BACKEND=claude runs claude --print; FH_BACKEND=codex runs codex exec; FH_BACKEND=auto prefers Codex when both CLIs are present.

For direct skill or agent execution outside Claude Code, use fh-run:

FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-run --skill phantom-quench --file docs/foo.md
FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-run --agent fh-commons:quench-challenger --file plugins/fh-meta/skills/foo/SKILL.md

For Codex-primary work, keep using Codex's native goal/session features when available. fh-goal is only a portable wrapper for one-off non-interactive runs that should be followed by FH governance:

FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-goal --prompt "Implement X and update tests" --gate quick

The broader FH automation layer still depends on Claude Code for sub-agents, hooks, and slash commands. The portable path is shared documents plus runtime adapters, not separate Codex and Claude forks.

Recommended posture — Claude Code as orchestrator, others as sidecars. FH's automation layer (auto-firing hooks, sub-agent dispatch, onboarding, memory) is Claude-Code-native, so the fullest experience runs Claude Code as the main orchestrator with Gemini, Codex, or Antigravity (agy) as actively-used sidecars. You can also run a non-CC runtime as your main agent — you keep the full methodology layer and M1 skills through fh-gate/fh-run, but you do not get the autopilot layer: hooks don't auto-fire, M2 agent-dispatch steps need the adapter (or interactive approval), and M3 skills are reference-only. This is a deliberate two-layer boundary, not a gap to be closed. Per-runtime detail: docs/codex-compat.md (tier-by-tier) and multi_model_sidecar_strategy.md (sidecar engines, including the Gemini→agy succession at the 2026-06-18 EOL).

Empirical result (2026-05-31): Applied to OpenCode's AI-generated permission/arity.ts (163 lines, CI green). Current gate semantics classify this as BLOCKED: 2 A-grade findings CI didn't catch (short-token overflow in allowlist, executor tools absent from arity table).

Full spec: fh_integration_contract.md

The forge

forge-harness treats a project like steel — and the metaphor is literal, not decoration. Work is shaped,
hardened by attack, and only then does it ship faster, for having survived.

Movement	What happens	The commands
Forge	shape the raw project into a harness — raise its floor	`install-wizard`, "harness-ify this project"
Quench	harden it by attack — the cold pass leaves standing only what is sound	`steel-quench` · `phantom-quench`
Temper	take the brittleness back out of the hardened asset	`steel-quench` Wave-T · `templates/temper_check.sh`
→ Accelerate	a blade that survived the forge cuts faster	`goal-quench` — Pass → Accelerate

All four movements ship. Temper was named before it was built — deliberately (see
ETHOS.md) — and shipped once measurement runs validated it. Around the forge,
two more signatures keep it running: harvest-loop (each session's lessons become permanent skills) and
agent-composer (orchestrate the dispatch). The other skills wait until you need them — full list below.

37 skills · 8 agents

Full asset activation check

Asset	Role	Triggers
`steel-quench`	Full-spectrum adversarial verification	"Run the quench", "Attack from the root"
`phantom-quench`	Phantom claim detection + source back-tracing	"Verify the source", "Grounding audit"
`harvest-loop`	End-of-session learning → evolution pipeline	"Harvest the session"
`agent-composer`	Plans optimal agent dispatch	"Run in parallel", "Which agents?"
`sim-conductor`	Meta-simulation orchestrator	"External user perspective"
`context-doctor`	Token efficiency + `.claudeignore`	"Session is slow", "Clean up context"
`harness-doctor`	Harness structure diagnosis	"Check my Claude setup"
`pipeline-conductor`	4-axis quality gate (backward/adversarial/forward/record)	"Run the quality gate"
`field-harvest`	Back-propagate field patterns to hub	"I could reuse this"
`frontier-digest`	HN + arXiv → actionable insights	"AI trend digest"
`hub-cc-pr-reviewer`	Automated PR review	"Review this PR"
`verify-bidirectional`	Reverse-verify decisions	"Is that right?", "Double-check"
`deep-clarify`	Socratic requirements clarification	"I'm not sure what to build"
`install-wizard`	Initial onboarding	"First-time setup"
`plugin-recommender`	Plugin recommendations	"Is there a good tool for this?"
`apex-review`	Executive-perspective quality review	"Will this hold up?"
`meta-prompt-builder`	Meta prompt design	"Write a prompt for the agent"
`asset-placement-gate`	Hub vs project asset routing	"Should this be shared?"
`cross-ecosystem-synergy-detection`	Cross-tool synergy finder	"Are my tools working together?"
`corpus-grounding-expander`	Multi-version public-domain corpus → verified-axiom grounding store	"Broaden the grounded corpus"
`persona-roster-expander`	Persona seed → tiered, judgment-mapped cast	"Broaden these personas"
`convergence-loop` (fh-commons)	N-round convergence loops	"Single-pass seems suspicious"
`token-budget-gate` (fh-commons)	Pre-task token cost estimate	"How expensive is this?"
`mcp-circuit-breaker` (fh-commons)	MCP tool failure pattern detection	"MCP keeps failing"
`quench-challenger` (fh-commons)	Adversarial pressure-test agent	"Challenge this with a devil"
(+ additional assets)	marketplace-gate · contention-layer · edit-manifest · fact-checker · goal-quench · hub-persona-auditor · install-doctor · memory-hygiene · persona-innovator · prompt-regression · public-surface-audit · skill-splitter

Active count	Diagnosis
28+	Advanced — chain agent-composer + sim-conductor + steel-quench + pipeline-conductor
10–27	Activation stage — gradually enable unchecked assets
0–9	Early stage — start with `install-wizard`

Find a skill by what you're trying to do:

Cluster	Skills
Verification	`steel-quench` · `phantom-quench` · `convergence-loop` · `prompt-regression` · `return-path-gate`
Orchestration	`agent-composer` · `pipeline-conductor` · `goal-quench` · `deliberation`
Diagnosis	`harness-doctor` · `context-doctor` · `install-doctor` · `mcp-circuit-breaker`
Harvesting / Learning	`harvest-loop` · `field-harvest` · `edit-manifest` · `memory-hygiene`
Gate / Guard	`token-budget-gate` · `asset-placement-gate` · `marketplace-gate`
Discovery	`plugin-recommender` · `cross-ecosystem-synergy-detection` · `frontier-digest` · `verify-bidirectional`
Content / Simulation	`sim-conductor` · `apex-review` · `meta-prompt-builder` · `deep-clarify`
Setup	`install-wizard` · `hub-cc-pr-reviewer` · `skill-splitter`

Full phrasebook — every skill + agent with its one-line definition and the plain-language phrase
that triggers it: CHEATSHEET.md §12.

Model setup

Claude Code does not auto-select models by task complexity — you configure this once.

/model sonnet   # recommended default — FH dispatches stronger models itself where they matter

Command	Who runs what	Best for
`/model sonnet`	Sonnet session; FH dispatches higher-tier sub-agents on declared floors	FH default — operation + routine dev
`/model opus`	Opus handles everything	Harness-editing sessions (Mode D) · maximum depth on every turn
`/model opusplan`	Opus plans · Sonnet executes (when Opus engages)	Cost-conscious routine coding — see caveat

Why default Sonnet now works: measured (see §Model setup evidence note below), operating FH is
nearly model-flat — the rules in context do most of the work. What still needs a stronger model is a
small set of depth-sensitive turns, and FH handles those itself: some skills and agents declare a
model-tier floor (e.g. quench-challenger floors at opus) and are dispatched as sub-agents at the
floor tier when your environment can reach it — your session model stays untouched. FH never switches
your session model: a default you set by hand is followed; floors apply only to FH's own sub-agent
dispatches. If your environment tops out below a floor (e.g. Sonnet-only API routing), the floored
asset still runs at the best available tier with an explicit below-floor flag in its output — degraded
delivery is visible, never silent (tier-floor resolution: knowledge/shared/harness-core/multi_model_sidecar_strategy.md §Tier-floor).

opusplan caveat (measured): its Opus engagement is not guaranteed — in a measured 10-turn run
it used Opus on 0 turns (CC classifies few turns as "plan-mode"). If you want Opus on every turn,
pin /model opus (22/22 turns Opus in the follow-up run). Sub-agent dispatch model is set by the
dispatch's own model parameter; the session model/plan-mode does not propagate to sub-agents.

By role: running FH (field projects, gates, routine dev) → /model sonnet + let the floors
escalate. Editing the harness itself (Mode D) → pin the strongest model you have — harness
self-development is where tier depth measurably pays (design-increment finding), while operation
does not. Sub-agent token costs are CC-visible in the session jsonl under message.model.

Measured, not asserted (2026-06-10, worked example): on a 30-point blind rule-application battery,
operating FH was nearly model-flat — Opus 4.8 / Sonnet 4.6 / Haiku 4.5 scored 100 / 97 / 94 against
a top-tier anchor at 100, with the rules in context doing most of the work. The tiers separated only on
above-rubric design increments (developing the harness, not running it) — which is why the default is
Sonnet with tier-floored dispatch covering the depth-sensitive turns, and a pinned stronger model is
recommended only for harness-editing sessions. Details: docs/OUTPUT_EVIDENCE.md §Validation signals.

If you use external CLIs (Gemini, Codex, gh copilot) as sidecars, their costs are billed to their own quota and not visible in CC's token display.

Hardware tiers (local sidecars are optional accelerators)

FH needs no local LLM — the baseline is whatever runs Claude Code. Local models are optional, for
the canary / cheap-breadth rungs only:

Tier	Spec	Runs locally	What it buys
Minimum	anything that runs Claude Code	nothing	full methodology + gates; operating FH is ~model-flat (Opus/Sonnet/Haiku 100/97/94)
Recommended	laptop-class, ~16GB RAM	one 8B-class quantized model (e.g. an 8B / small Gemma)	a token-free floor canary (pre-screen before a metered sim) · offline triage · a cheap-breadth panel arm
Optional (heavy)	~24GB VRAM GPU	a 27–32B model	a stronger decorrelation canary

Local tiers are canaries, never the terminal verdict — measured: the floor model missed a subtle
adversarial case the frontier caught (and even a 27–32B local scored 1/4 on it). They lower the cost
of breadth; the verdict stays frontier.

Multi-Model Sidecar

Run Gemini, Codex, or gh copilot as independent reviewers alongside Claude. The point is context
isolation: a reviewer that did not co-create the work is cold to its froth — whoever sits outside the
collaboration tends to catch what the co-author, now an advocate for the shared result, glides past. It's
symmetric, not a model-ranking: when you co-build with Gemini, a fresh Claude catches its froth; when you
co-build with Claude, a fresh sidecar catches Claude's.

In one internal case study, layering reviewers surfaced progressively more issues — a single in-session
pass missed items that cross-session personas caught, and an external-CLI reviewer surfaced a few the
Claude personas shared a blind spot on. Treat it as a worked example, not a benchmark: the gain scales
with task complexity and how much you co-created the artifact, and an isolated reviewer also adds false
positives you have to triage. Whether the net is worth it on a given task is an empirical, per-use question.

Claude-side token cost does not increase when the extra reviewer is an external CLI — it bills to its own quota.

Research

FH v1.0 paper — Zenodo (DOI: 10.5281/zenodo.20397566) · arXiv in review.
Documents 2-layer design, 6-axis framework, 4-agent orchestration, and compounding loop with empirical evidence.

External convergence:

"Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems" — arXiv April 2026
"Code as Agent Harness" — arXiv May 2026
Stanford IRIS Lab: "Meta-Harness" — +7.7pts at 4× fewer tokens

Learn more

Resource	Purpose
`CLAUDE.md`	AI operating rules + sync/push protocol
`CHEATSHEET.md`	Full command reference
`AGENTS.md`	Runtime agent specs
`CATALOG.md`	Past work search index
`CONTRIBUTING.md`	How to contribute skills and patterns
`tracks/_contrib/`	Consent lane — share a de-identified work session; the repo compounds across operators, not just locally
`fh_integration_contract.md`	Governance gate spec