forge-harness

mcp
Security Audit
Fail
Health Warn
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Fail
  • child_process — Shell command execution capability in bin/fh-gate.js
  • child_process — Shell command execution capability in bin/fh-goal.js
  • child_process — Shell command execution capability in bin/fh-run.js
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

Forge your Claude Code projects — pass them through, they come out faster.

README.md

forge-harness — Forge your projects, pass them through, faster. Quality is the lever — speed is the result.

MIT License DOI Claude Code Codex-compatible beta — help validate (issue #72) npm

Forge your Claude Code projects — pass them through, they come out faster.
A practitioner's meta-harness: it raises each project's floor (harness-ify the setup)
and ceiling (accelerate the work), then compounds the gains across your whole portfolio.

Quality is the lever; speed is the result. Every change earns its way through the gates —
adversarial · phantom · regression — and that is what makes the next change faster.

Fork it. Rename it. Make it yours.

FORK · ADAPT · COLLABORATE · EMPOWER

The principles · Why it exists · The evidence · How to use it


If you're here because… forge-harness solves it
Context disappears when a session ends Persistent tracks/ — resumable from anywhere
You repeat the same setup across projects Connect once to the hub, share across all projects
Team AI know-how lives only in people's heads Codify it so everyone shares it
You want AI to get better as work accumulates Skills and patterns compound session over session
You need a governance layer for AI-generated code fh-gate wraps any coding agent as a post-generation gate

This document is for humans. AI operating rules → CLAUDE.md · Command reference → CHEATSHEET.md


Get started in 2 minutes

Prerequisite: Claude Code CLI — verify with claude --version

# 1. Install the plugin
claude plugin marketplace add https://github.com/chrono-meta/forge-harness.git
claude plugin install -s user fh-meta@forge-harness

# 2. Clone the hub
git clone https://github.com/chrono-meta/forge-harness.git ~/forge-harness
cd ~/forge-harness

# 3. Start a session
claude

✅ Claude reads CLAUDE.md and asks what project to connect or what task to start.
Say "Connect a project" → hub scans ../, finds .git directories, creates tracks/{project}/.

Plugin only (no clone):

claude plugin marketplace add https://github.com/chrono-meta/forge-harness.git  # once
claude plugin install -s user fh-meta@forge-harness
cd ~/projects/{your-project} && claude

⚠️ Plugin-only is partial synergy. You get the skills and agents, but not Layer 1 — the
CLAUDE.md governance (active onboarding, the 4-axis gate, mode branching) and the compounding
context (tracks/ memory accumulation, harvest-loop learning). Each skill runs the same in
isolation; what's missing is the orchestration that makes them compound across sessions. Clone the
hub (above) when you want the full set, not just the tools.

🚪 New here / just want the skills? Start with the opinionated front door —
templates/starter_profile.md: one install command, a curated
first-five skills, and a zero-install governance gate (npx … fh-gate). The other 28 skills wait
until you need them.


What it is

forge-harness is structured as two distinct layers:

Layer Contents AI compatibility
Methodology layer tracks/, knowledge/, SKILL.md docs, session protocols Any AI model
Automation layer .claude/agents/, hooks, slash commands, CLAUDE.md rules Claude Code only

The methodology layer is the portable core — persistent hub, accumulating learnings, curating cross-project knowledge. The automation layer makes it frictionless when running Claude Code.

Where this sits (2026): "harness engineering" is now a public paradigm — and basic agent
orchestration is rapidly commoditizing into standard infrastructure. FH deliberately stakes nothing on
that plumbing. Its durable layer is what does not commoditize: the governance gates (adversarial ·
phantom · regression), drift control, and the cross-project compounding loop. Routing and dispatch are
means; the gate and the loop are the asset.

forge-harness/   ← the hub (persistent brain)
├── knowledge/   → shared across all projects
└── tracks/      → work records per project

Project A  ──→  connect hub in CLAUDE.md
Project B  ──→  connect hub in CLAUDE.md

Why it works

After a long co-authoring session with your AI, you and it share the same context — and the same blind spots. The reviewer worth having is the one who never saw your reasoning. You can get that by hand: paste the work into a fresh, empty chat. FH just turns that chore into one routine command.

  • sidecar / agent dispatch → a reviewer with none of your session's context
  • steel-quench · phantom-quench → that cold pass, on demand

It's model-agnostic: co-build with one AI, run the cold pass with any other. Whoever was absent from the original session is your cold reviewer — this is not a ranking of models.

What FH does not claim: the cold pass is your base model's own ability, not a detection engine FH adds — a plain prompt to a fresh instance does much of the same. FH's value is narrower and honest: it takes a method drawn from real practice and makes running that independent pass routine, instead of a chore you skip. The methodology is copyable; what FH packages is the workflow, not a secret sauce.


Governance layer for AI-generated code

FH wraps any coding agent (OpenCode, Codex, etc.) as a post-generation governance gate.

npx --package @chrono-meta/fh-gate fh-gate                    # default: Claude backend
FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-gate   # Codex backend
FH_BACKEND=auto npx --package @chrono-meta/fh-gate fh-gate "src/foo.ts" full
# → FH_GATE_VERDICT: PASS | PENDING | BLOCKED | ESCALATE

fh-gate uses the same FH governance prompt for both runtimes. FH_BACKEND=claude runs claude --print; FH_BACKEND=codex runs codex exec; FH_BACKEND=auto prefers Codex when both CLIs are present.

For direct skill or agent execution outside Claude Code, use fh-run:

FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-run --skill phantom-quench --file docs/foo.md
FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-run --agent fh-commons:quench-challenger --file plugins/fh-meta/skills/foo/SKILL.md

For Codex-primary work, keep using Codex's native goal/session features when available. fh-goal is only a portable wrapper for one-off non-interactive runs that should be followed by FH governance:

FH_BACKEND=codex npx --package @chrono-meta/fh-gate fh-goal --prompt "Implement X and update tests" --gate quick

The broader FH automation layer still depends on Claude Code for sub-agents, hooks, and slash commands. The portable path is shared documents plus runtime adapters, not separate Codex and Claude forks.

Recommended posture — Claude Code as orchestrator, others as sidecars. FH's automation layer (auto-firing hooks, sub-agent dispatch, onboarding, memory) is Claude-Code-native, so the fullest experience runs Claude Code as the main orchestrator with Gemini, Codex, or Antigravity (agy) as actively-used sidecars. You can also run a non-CC runtime as your main agent — you keep the full methodology layer and M1 skills through fh-gate/fh-run, but you do not get the autopilot layer: hooks don't auto-fire, M2 agent-dispatch steps need the adapter (or interactive approval), and M3 skills are reference-only. This is a deliberate two-layer boundary, not a gap to be closed. Per-runtime detail: docs/codex-compat.md (tier-by-tier) and multi_model_sidecar_strategy.md (sidecar engines, including the Gemini→agy succession at the 2026-06-18 EOL).

Empirical result (2026-05-31): Applied to OpenCode's AI-generated permission/arity.ts (163 lines, CI green). Current gate semantics classify this as BLOCKED: 2 A-grade findings CI didn't catch (short-token overflow in allowlist, executor tools absent from arity table).

Full spec: fh_integration_contract.md


The forge

forge-harness treats a project like steel — and the metaphor is literal, not decoration. Work is shaped,
hardened by attack, and only then does it ship faster, for having survived.

Movement What happens The commands
Forge shape the raw project into a harness — raise its floor install-wizard, "harness-ify this project"
Quench harden it by attack — the cold pass leaves standing only what is sound steel-quench · phantom-quench
Temper take the brittleness back out of the hardened asset steel-quench Wave-T · templates/temper_check.sh
Accelerate a blade that survived the forge cuts faster goal-quenchPass → Accelerate

All four movements ship. Temper was named before it was built — deliberately (see
ETHOS.md) — and shipped once measurement runs validated it. Around the forge,
two more signatures keep it running: harvest-loop (each session's lessons become permanent skills) and
agent-composer (orchestrate the dispatch). The other skills wait until you need them — full list below.

37 skills · 8 agents

Full asset activation check
Asset Role Triggers
steel-quench Full-spectrum adversarial verification "Run the quench", "Attack from the root"
phantom-quench Phantom claim detection + source back-tracing "Verify the source", "Grounding audit"
harvest-loop End-of-session learning → evolution pipeline "Harvest the session"
agent-composer Plans optimal agent dispatch "Run in parallel", "Which agents?"
sim-conductor Meta-simulation orchestrator "External user perspective"
context-doctor Token efficiency + .claudeignore "Session is slow", "Clean up context"
harness-doctor Harness structure diagnosis "Check my Claude setup"
pipeline-conductor 4-axis quality gate (backward/adversarial/forward/record) "Run the quality gate"
field-harvest Back-propagate field patterns to hub "I could reuse this"
frontier-digest HN + arXiv → actionable insights "AI trend digest"
hub-cc-pr-reviewer Automated PR review "Review this PR"
verify-bidirectional Reverse-verify decisions "Is that right?", "Double-check"
deep-clarify Socratic requirements clarification "I'm not sure what to build"
install-wizard Initial onboarding "First-time setup"
plugin-recommender Plugin recommendations "Is there a good tool for this?"
apex-review Executive-perspective quality review "Will this hold up?"
meta-prompt-builder Meta prompt design "Write a prompt for the agent"
asset-placement-gate Hub vs project asset routing "Should this be shared?"
cross-ecosystem-synergy-detection Cross-tool synergy finder "Are my tools working together?"
corpus-grounding-expander Multi-version public-domain corpus → verified-axiom grounding store "Broaden the grounded corpus"
persona-roster-expander Persona seed → tiered, judgment-mapped cast "Broaden these personas"
convergence-loop (fh-commons) N-round convergence loops "Single-pass seems suspicious"
token-budget-gate (fh-commons) Pre-task token cost estimate "How expensive is this?"
mcp-circuit-breaker (fh-commons) MCP tool failure pattern detection "MCP keeps failing"
quench-challenger (fh-commons) Adversarial pressure-test agent "Challenge this with a devil"
(+ additional assets) marketplace-gate · contention-layer · edit-manifest · fact-checker · goal-quench · hub-persona-auditor · install-doctor · memory-hygiene · persona-innovator · prompt-regression · public-surface-audit · skill-splitter
Active count Diagnosis
28+ Advanced — chain agent-composer + sim-conductor + steel-quench + pipeline-conductor
10–27 Activation stage — gradually enable unchecked assets
0–9 Early stage — start with install-wizard

Find a skill by what you're trying to do:

Cluster Skills
Verification steel-quench · phantom-quench · convergence-loop · prompt-regression · return-path-gate
Orchestration agent-composer · pipeline-conductor · goal-quench · deliberation
Diagnosis harness-doctor · context-doctor · install-doctor · mcp-circuit-breaker
Harvesting / Learning harvest-loop · field-harvest · edit-manifest · memory-hygiene
Gate / Guard token-budget-gate · asset-placement-gate · marketplace-gate
Discovery plugin-recommender · cross-ecosystem-synergy-detection · frontier-digest · verify-bidirectional
Content / Simulation sim-conductor · apex-review · meta-prompt-builder · deep-clarify
Setup install-wizard · hub-cc-pr-reviewer · skill-splitter

Full phrasebook — every skill + agent with its one-line definition and the plain-language phrase
that triggers it: CHEATSHEET.md §12.


Model setup

Claude Code does not auto-select models by task complexity — you configure this once.

/model sonnet   # recommended default — FH dispatches stronger models itself where they matter
Command Who runs what Best for
/model sonnet Sonnet session; FH dispatches higher-tier sub-agents on declared floors FH default — operation + routine dev
/model opus Opus handles everything Harness-editing sessions (Mode D) · maximum depth on every turn
/model opusplan Opus plans · Sonnet executes (when Opus engages) Cost-conscious routine coding — see caveat

Why default Sonnet now works: measured (see §Model setup evidence note below), operating FH is
nearly model-flat — the rules in context do most of the work. What still needs a stronger model is a
small set of depth-sensitive turns, and FH handles those itself: some skills and agents declare a
model-tier floor
(e.g. quench-challenger floors at opus) and are dispatched as sub-agents at the
floor tier when your environment can reach it — your session model stays untouched. FH never switches
your session model
: a default you set by hand is followed; floors apply only to FH's own sub-agent
dispatches. If your environment tops out below a floor (e.g. Sonnet-only API routing), the floored
asset still runs at the best available tier with an explicit below-floor flag in its output — degraded
delivery is visible, never silent (tier-floor resolution: knowledge/shared/harness-core/multi_model_sidecar_strategy.md §Tier-floor).

opusplan caveat (measured): its Opus engagement is not guaranteed — in a measured 10-turn run
it used Opus on 0 turns (CC classifies few turns as "plan-mode"). If you want Opus on every turn,
pin /model opus (22/22 turns Opus in the follow-up run). Sub-agent dispatch model is set by the
dispatch's own model parameter; the session model/plan-mode does not propagate to sub-agents.

By role: running FH (field projects, gates, routine dev) → /model sonnet + let the floors
escalate. Editing the harness itself (Mode D) → pin the strongest model you have — harness
self-development is where tier depth measurably pays (design-increment finding), while operation
does not. Sub-agent token costs are CC-visible in the session jsonl under message.model.

Measured, not asserted (2026-06-10, worked example): on a 30-point blind rule-application battery,
operating FH was nearly model-flat — Opus 4.8 / Sonnet 4.6 / Haiku 4.5 scored 100 / 97 / 94 against
a top-tier anchor at 100, with the rules in context doing most of the work. The tiers separated only on
above-rubric design increments (developing the harness, not running it) — which is why the default is
Sonnet with tier-floored dispatch covering the depth-sensitive turns, and a pinned stronger model is
recommended only for harness-editing sessions. Details: docs/OUTPUT_EVIDENCE.md §Validation signals.

If you use external CLIs (Gemini, Codex, gh copilot) as sidecars, their costs are billed to their own quota and not visible in CC's token display.

Hardware tiers (local sidecars are optional accelerators)

FH needs no local LLM — the baseline is whatever runs Claude Code. Local models are optional, for
the canary / cheap-breadth rungs only:

Tier Spec Runs locally What it buys
Minimum anything that runs Claude Code nothing full methodology + gates; operating FH is ~model-flat (Opus/Sonnet/Haiku 100/97/94)
Recommended laptop-class, ~16GB RAM one 8B-class quantized model (e.g. an 8B / small Gemma) a token-free floor canary (pre-screen before a metered sim) · offline triage · a cheap-breadth panel arm
Optional (heavy) ~24GB VRAM GPU a 27–32B model a stronger decorrelation canary

Local tiers are canaries, never the terminal verdict — measured: the floor model missed a subtle
adversarial case the frontier caught (and even a 27–32B local scored 1/4 on it). They lower the cost
of breadth
; the verdict stays frontier.


Multi-Model Sidecar

Run Gemini, Codex, or gh copilot as independent reviewers alongside Claude. The point is context
isolation
: a reviewer that did not co-create the work is cold to its froth — whoever sits outside the
collaboration tends to catch what the co-author, now an advocate for the shared result, glides past. It's
symmetric, not a model-ranking: when you co-build with Gemini, a fresh Claude catches its froth; when you
co-build with Claude, a fresh sidecar catches Claude's.

In one internal case study, layering reviewers surfaced progressively more issues — a single in-session
pass missed items that cross-session personas caught, and an external-CLI reviewer surfaced a few the
Claude personas shared a blind spot on. Treat it as a worked example, not a benchmark: the gain scales
with task complexity and how much you co-created the artifact, and an isolated reviewer also adds false
positives you have to triage. Whether the net is worth it on a given task is an empirical, per-use question.

Claude-side token cost does not increase when the extra reviewer is an external CLI — it bills to its own quota.


Research

FH v1.0 paperZenodo (DOI: 10.5281/zenodo.20397566) · arXiv in review.
Documents 2-layer design, 6-axis framework, 4-agent orchestration, and compounding loop with empirical evidence.

External convergence:


Learn more

Resource Purpose
CLAUDE.md AI operating rules + sync/push protocol
CHEATSHEET.md Full command reference
AGENTS.md Runtime agent specs
CATALOG.md Past work search index
CONTRIBUTING.md How to contribute skills and patterns
tracks/_contrib/ Consent lane — share a de-identified work session; the repo compounds across operators, not just locally
fh_integration_contract.md Governance gate spec

Reviews (0)

No results found