Forge

Product Development Framework — From fuzzy ideas to shippable products, with full AI-assisted guidance.

A complete product development methodology for AI coding assistants: Claude Code, Cursor, OpenCode.

vs OpenSpec? OpenSpec excels at one brownfield change at a time (CLI + changes/ + slash commands). Forge covers the full product pipeline (idea → Spec → Plan → TDD build → review → release) and adds hooks, memory, evolution, and multi-client adapters. Brownfield deltas use /change-manager (OpenSpec-aligned). Full comparison →

No npm install required to use the framework — copy adapter files into your project and open your AI client. Node.js + pnpm are only needed if you contribute to this repo or run scripts/.

Architecture at a glance

flowchart LR
  subgraph inputs [You]
    Idea[Idea / change request]
  end

  subgraph forge [Forge Harness]
    Spec[product-spec-builder]
    Chg[change-manager]
    Plan[dev-planner]
    Build[dev-builder]
    Rev[code-review / bug-fixer]
    Rel[release-builder]
    Hooks[8 hooks + evolution]
    Mem[memory/ 3-tier]
  end

  subgraph clients [AI clients]
    CC[Claude Code]
    CU[Cursor]
    OC[OpenCode]
  end

  Idea --> Spec
  Spec --> Plan
  Idea --> Chg
  Chg --> Plan
  Plan --> Build
  Build --> Rev
  Rev --> Rel
  Hooks -.-> Build
  Mem -.-> Build
  forge --> clients

Section	Description
Installation & Usage	Clone, copy adapters, hooks, first run
Workflow	Spec → Plan → Build → Release (brownfield: `/change-manager`)
Framework Development	Tests, sync, dependency graph (contributors)

What's New

v1.20.6 — 2026-05-23

Discoverability: OpenSpec diff + architecture diagram at README top; pnpm script to sync GitHub About/topics from .github/repo-metadata.json.

v1.20.5 — 2026-05-23

memory-guard: PostToolUse bundles context-compaction + check-handoff (8 default hooks).

v1.20.4 — 2026-05-23

SKILL slimming: dev-builder and product-spec-builder detail moved to references/; main SKILL files stay under 500 lines.

v1.20.3 — 2026-05-23

All Skill commands thinned: Every commands/*.md is now an index to SKILL.md (no duplicated phase prose).
auto-push off by default: Removed from adapter settings.json and loadouts; enable manually if you want push-after-commit.

v1.20.2 — 2026-05-23

Spec / change-manager split: Iteration mode no longer creates changes/ — use /change-manager for scoped features; major edits stay in Product-Spec.md.
Review default: Parallel 4-agent review only when complexity is moderate/complex; default is quick pass (change_complexity=simple).
Commands thinned: Key slash commands point to SKILL.md sections instead of duplicating workflows.

v1.20.1 — 2026-05-23

Audit fixes: CLAUDE.md routes active changes/ to /change-manager; Mission includes brownfield step; change-verify-template.md added.
CHANGELOG: Documents openhuman-comparison.md (shipped in prior commit).
Loadouts: cli-tool / minimal omit change-manager by design — use full or web-app for brownfield.

v1.20.0 — 2026-05-23

change-manager Skill: For projects that already have Product-Spec.md — one feature per changes/<name>/ folder with propose → apply → verify → archive (OpenSpec-aligned). Templates + /change-manager command; implementation still delegates to /dev-planner and /dev-builder.
openspec-comparison.md: When to use Forge vs OpenSpec CLI, artifact mapping, and workflow diagram — core/docs/openspec-comparison.md.
openhuman-comparison.md: Forge vs OpenHuman (memory, context compression, what not to copy) — core/docs/openhuman-comparison.md.
12 Skills: change-manager wired into full / web-app loadouts and all adapter bundles via pnpm sync.

v1.19.1 — 2026-05-23

Hallucination Gate wired: All adapter settings.json register PreToolUse → hallucination-gate; hook reads tool_name from stdin JSON; Windows .bat uses Node parsing.
Parallel review docs aligned: code-review, dev-builder, bug-fixer SKILLs and README workflow unified to parallel 4-agent + aggregation; removed stale Stage 1/2 language; confidence thresholds ≥0.6 / 0.3.
Commands layer complete: Added commands/*.md for design-brief-builder, design-maker, evolution-engine, feedback-writer (all 11 skills with slash commands now have command files).
Loadout cleanup: Removed ReqForge-only check-sync from user-facing loadouts.
Cross-platform tooling: pnpm validate-skill defaults to scripts/validate-skill.mjs; added pnpm apply-loadout <name> <client>.
Docs & version: package.json, DEV-PLAN, Product-Spec, core/docs synced to v1.19.1; Sub-Agent count corrected to 10.

v1.19 — 2026-05-23

Loadout mechanism: Reusable bundles of skills, agents, hooks, and MCP servers for different project types. 4 built-in loadouts: full, web-app, cli-tool, minimal. Validated by loadout.schema.json. Synced to all adapters via pnpm sync.
loadout.schema.json: JSON Schema v7 validation for loadout definitions (required fields: name, version, description, skills, agents, hooks).

v1.18 — 2026-05-23

skill.json metadata: All 11 skills now ship with machine-readable skill.json (name, version, triggers, prerequisites, agents, hooks). Validated by validate-skill.sh via Node/Python. JSON Schema at core/skills/skill.schema.json.
Commands layer: All 11 skills with slash commands now have commands/<name>.md (v1.19.1 completed the remaining 4). YAML frontmatter + phased workflows. pnpm validate-skill uses cross-platform validate-skill.mjs by default.
Parallel agent code review: 4 specialized review agents (design, bug, security, types) run concurrently, each returning structured findings with confidence scores (0.0-1.0). Aggregator applies thresholding (≥0.6 confirmed, 0.3-0.6 suspected, <0.3 suppressed) with cross-agent boost. Replaces the old serial two-stage review.
Hallucination Gate: PreToolUse hook verifies Write/Edit target directories exist (v1.19.1: registered in all adapter settings).
Project state injection: check-evolution.sh now detects Product-Spec/DEV-PLAN/Code presence on session start and injects routing guidance as additionalContext.
validate-skill.sh — skill.json validation: Added existence check + required field validation (name, version, description, triggers.auto/manual/command).
sub-agent-orchestration.md: Documented parallel review pattern with all 4 specialist agents and aggregation rules.
Propagated to all 3 adapters (claude-code, cursor, opencode) via pnpm sync.

v1.17 — 2026-05-22

Decidable Activation — [Not For] section: All 11 skills now include a [Not For] section specifying when NOT to use the skill and what to use instead. Added as a required section in validate-skill.sh. Updated skill-template.md.
Three-Layer Diagnostic Model: bug-fixer now goes beyond root cause to ask: Symptom → Design Flaw → Principle Violation. Every fix report includes all three layers to prevent recurrence, not just patch the symptom.
Numeric Quality Rubric: skill-builder gets a 16-item, 32-point scoring system. Ship threshold ≥ 24 with no critical item at 0. Run pnpm validate-skill:bash --score to compute (bash script only).
create-skill.sh scaffold: CLI tool to generate a new Skill directory from a name. Supports --minimal (required sections only) and --full (with recommended sections). Run pnpm create-skill <name>.

v1.16 — 2026-05-21

Harness Engineering principles: dev-builder upgraded with Tool AI-fication Priority (CLI > MCP > Skill > GUI), Substitute Don't Mock (real substitutes over mocks), Environment-First (project must run before features), Minimum Runnable Subset (each Phase delivers an end-to-end core path). Scripted Verification (complex Phases generate verify-phase-N.sh).
Machine Gates: 3-level enforceable gates added to CLAUDE.md — Hallucination Gate (fails on wrong paths/missing deps), Sloppiness Gate (blocks completion without verification evidence), Overstepping Gate (rejects scope creep). Codification principle: gates that can be linted MUST be codified.
Iron Rules: 8 baseline rules extracted as the Forge foundation (knowledge offloading, no prompt magic, real files, guardrails, etc.). Documented in Product-Spec.md and README.
llms.txt: AI-searchable project summary at repo root for LLM discoverability.
Per-directory AGENTS.md: Local operational boundaries for core/skills/, core/agents/, core/hooks/, core/templates/, core/feedback/ — each directory gets MUST/MUST NOT/SHOULD rules.
validate-skill.sh: Formal SKILL.md specification validator — checks frontmatter, required sections, kebab-case, Gotchas count, file size, placeholder markers. Runs via pnpm validate-skill.
Claude Code adapter rules migration: Per-directory rules converted from AGENTS.md (which Claude Code doesn't read) to .claude/rules/*.md with path-scoped globs frontmatter. AGENTS.md retained for OpenCode adapter.
SKILL.md structural audit: All 11 skills validated — 11 missing-section errors and 19 warnings fixed (added [Dependency Check], [File Structure], [Initialization], [Output Style], [Gotchas] sections across design-maker, evolution-engine, feedback-writer, bug-fixer, code-review, dev-builder, dev-planner).
Gotchas in every skill: [Gotchas] section added to all 11 skills capturing domain-specific failure points (vague requirements, privacy leaks, premature evolution, duplicate feedback, etc.). Each skill accumulates hard-won lessons over time.
Skill template updated: New skills automatically include a [Gotchas] section as a recommended component.
CLI best practices in CLAUDE.md: /model, /compact, /context, /sandbox usage guidance encoded as General Rules. Key rules wrapped in <important if=""> tags for better adherence.
Renamed [Anti-Rationalization Checklist] → [Gotchas: Anti-Rationalization] in dev-builder, code-review, bug-fixer for naming consistency.
Glue Code First: dev-builder's "SDK-First" upgraded to "Glue Code First" — priority chain: framework built-in → open-source library → AI prompt → custom logic only when necessary.
Generator/Optimizer recursion: evolution-engine now has explicit First Principles — the engine that evolves rules should itself be evolvable through the same feedback loop.
Cross-session audit: code-review added principle that complex reviews must run in isolated sub-agent sessions to prevent self-confirmation bias.
Prompt remediation: feedback template now includes a prompt_remediation field — each failure can carry a reusable prompt fragment to prevent recurrence.

v1.14.2 — 2026-05-20

forge-install: pnpm forge-install <client> --target <dir> copies the adapter into your project; install.sh / install.ps1 wrappers included
Safe upgrade: --force merges without overwriting feedback/ or settings.local.json

v1.14.1 — 2026-05-20

Script unit tests: scripts/__tests__/ covers sync.ts and dependency-graph.ts (Vitest 4.1.6); run pnpm test to verify
Dependency graph fix: Named imports (import { x } from "./y") now resolve correctly for more accurate blast-radius
Engineering alignment: package.json at 1.14.1 with exact patch-pinned devDependencies; DEV-PLAN.md progress table added

v1.14 — 2026-05-19

Exact version pinning: Every dependency pinned to major.minor.patch — no ranges, no latest
Dedicated AGENTS.md template: OpenCode gets a constraint-focused format (tech stack, behavior boundaries, hard constraints), not a CLAUDE.md clone
Dependency graph: scripts/dependency-graph.ts — file-level import graph for blast-radius analysis. pnpm dep-graph build | affected | risk | stats. Integrated into dev-builder review loop: code-reviewer receives affected_files for focused review

v1.13 — 2026-05-19

Planner sub-agent: Dedicated agent for architecture design and Phase splitting, decoupled from implementer context
Session handoff: handoff-template.md + check-handoff hook to generate session summaries before context reset, preventing lost progress
Complexity gate: code-reviewer now skips parallel specialist agents for change_complexity="simple", matching review depth to change scope
Model version tracking: feedback-observer records model version with each feedback, enabling evolution to detect outdated rules

v1.10–1.12 — 2026-05-19

test-writer sub-agent: Vitest-based test generator for tools/scripts (v1.14.1 ships the sync / dependency-graph test suite)
check-sync hook: Detects core/ vs adapters/ divergence after edits
Self-wired settings: ReqForge's own .claude/settings.json with hook events wired; settings.local.json pruned 65→32 lines

v1.9 — 2026-05-19

AI Only for Judgment Tasks: Deterministic logic is plain code, not AI busywork
Fail Loudly: Uncertainty must be stated explicitly, never hidden
Token Budget Awareness: Check context headroom after each Task

See CHANGELOG.md for the full version history.

Overview

If you've done Vibe Coding, you know the hard part isn't getting AI to write code — it's managing the entire product development process. You tell AI "build me a writing tool," and it starts coding. Halfway through, you realize the direction is wrong and start over. Features finally work, but the UI looks terrible — no design specs, so AI pieced together default styles from training data. Fix the UI, introduce bugs. Fix bugs, introduce more bugs. Context gets long, AI forgets earlier requirements, code starts drifting.

The root cause isn't that models aren't smart enough. It's that there's no system around the model.

Forge is an Agent Harness — not about optimizing how you talk to AI, but building a complete system of constraints, guidance, and feedback. The AI knows what to do before it starts, automatically verifies results afterward, self-corrects when things go wrong, and never makes the same mistake twice.

Harness = Guides (feedforward) + Sensors (feedback) + Steering Loop (evolution)

Guides — Each Skill defines methodology, workflow, and acceptance criteria. Before the agent acts, it knows exactly "how to do it" and "what counts as done."
Sensors — Hook scripts + Code Review check every critical node after the agent acts. No reliance on the model's self-awareness.
Steering Loop — Every correction you give is recorded. When the same issue surfaces 3+ times, it's automatically promoted to a formal rule in the Skill.

Installation & Usage

Forge is copy-to-use: no package publish, no npm install in your app project. You only need a supported AI coding assistant.

Prerequisites

Required	Notes
AI client (one of)	Claude Code, Cursor, or OpenCode
Git	Clone this repo; optional for your own project
Empty or existing project folder	Forge files live at the project root alongside your code

Optional (contributors only)	Notes
Node.js 22.x LTS + pnpm 10.x	Run `pnpm test`, `pnpm sync`, `pnpm dep-graph` — see Framework Development

Step 1 — Clone Forge

git clone https://github.com/zxpmail/ReqForge.git
cd ReqForge

Keep the clone path handy — you will copy files from ReqForge/adapters/... into your app project.

Step 2 — Install into your project

Option A — One-command install (recommended)

From your Forge clone (requires Node.js for ts-node):

# Install into another project
pnpm forge-install claude-code --target /path/to/my-app

# Install into current directory
pnpm forge-install cursor .

# Merge upgrade (keeps your feedback/ and settings.local.json)
pnpm forge-install claude-code --target ../my-app --force

# Windows — or use the PowerShell wrapper from the Forge repo root
.\scripts\install.ps1 claude-code C:\path\to\my-app

# macOS / Linux wrapper
./scripts/install.sh opencode /path/to/my-app

On Windows, settings.windows.json is applied automatically. Use --windows on other platforms if needed.

Option B — Manual copy

Create or open your app directory, then copy only the adapter folder for your AI client.

Client	Copy from (inside Forge clone)	Into your project
Claude Code	`adapters/claude-code/.claude/`	`<your-project>/.claude/`
Cursor	`adapters/cursor/.cursor/`	`<your-project>/.cursor/`
OpenCode	`adapters/opencode/.opencode/`	`<your-project>/.opencode/`

Examples (replace paths with your actual locations):

# macOS / Linux — Claude Code
cp -R /path/to/ReqForge/adapters/claude-code/.claude /path/to/my-app/.claude

# macOS / Linux — Cursor
cp -R /path/to/ReqForge/adapters/cursor/.cursor /path/to/my-app/.cursor

# macOS / Linux — OpenCode
cp -R /path/to/ReqForge/adapters/opencode/.opencode /path/to/my-app/.opencode

# Windows — Claude Code (PowerShell)
Copy-Item -Recurse -Force C:\path\to\ReqForge\adapters\claude-code\.claude C:\path\to\my-app\.claude

# Windows — Cursor
Copy-Item -Recurse -Force C:\path\to\ReqForge\adapters\cursor\.cursor C:\path\to\my-app\.cursor

OpenCode uses .opencode/AGENTS.md as the control file (constraint format: tech stack, behavior boundaries, hard constraints) — not a copy of root CLAUDE.md.

Step 3 — Enable hooks (Claude Code & Cursor)

Hooks run before tool use, on commit, edit, session start, etc. Default settings.json registers 8 hooks (including PreToolUse → hallucination-gate; auto-push is optional). After copying .claude/ or .cursor/:

Platform	Action
Windows	In `.claude/` (or `.cursor/` inside rules): `copy settings.windows.json settings.json`
Linux / Mac	Default `settings.json` uses `.sh` hooks — no change needed
OpenCode	No `settings.json`; `.sh` / `.bat` hooks work per platform

Step 3b — Loadouts (optional)

Adapters ship 4 loadout bundles under loadouts/ (full, web-app, cli-tool, minimal). Each JSON lists recommended skills, agents, and hooks for a project type.

Default install ≈ full loadout (all hooks in settings.json).
Trim hooks (contributors, from Forge clone): pnpm apply-loadout minimal claude-code merges a lighter hook set into adapter settings.json. Add --dry-run to preview.
Loadouts are reference manifests — skills/agents are already copied; use loadouts to understand what each bundle includes.
Brownfield (/change-manager): included in full and web-app only; cli-tool and minimal omit it — copy the skill from core/skills/change-manager/ or switch loadout if you need changes/ on a CLI project.

Step 4 — First run in your AI client

Open your project folder (the one that now contains .claude/, .cursor/, or .opencode/) in the AI client.
Start a new chat. Forge detects progress from files present (Product-Spec.md, DEV-PLAN.md, code, memory/).
Describe your product idea in natural language, or invoke a Skill:

Goal	Skill command (Claude Code / OpenCode style)	Output
Requirements	`/product-spec-builder`	`Product-Spec.md`
Design brief (optional)	`/design-brief-builder`	`Design-Brief.md`
Dev plan	`/dev-planner`	`DEV-PLAN.md`
Brownfield feature (existing Spec)	`/change-manager propose <name>` → apply → verify → archive	`changes/<name>/` → `changes/archive/`
Implementation	`/dev-builder`	Code + `memory/` (auto-created)
Bug fix	Describe the bug (auto-triggers `/bug-fixer`)	Fix + review loop
Release	`/release-builder`	Build / deploy checklist

Cursor: rules load from .cursor/rules/ automatically; refer to skills in chat (e.g. “run product-spec-builder”) or use your client’s skill UI if configured.

Quick Spec: one sentence like “A habit tracker with AI coaching” — the agent can generate a minimal Product-Spec.md with [待确认] markers for you to refine.

After installation — what appears in your project

my-app/
├── .claude/                    # or .cursor/ or .opencode/  ← adapter bundle
│   ├── CLAUDE.md               # control file (OpenCode: AGENTS.md)
│   ├── settings.json           # 8 hooks wired (incl. hallucination-gate; auto-push optional)
│   ├── skills/                 # 12 Skill definitions + commands/
│   ├── agents/                 # 10 Sub-agent definitions
│   ├── hooks/                  # .sh + .bat hook scripts
│   ├── loadouts/               # full | web-app | cli-tool | minimal
│   ├── feedback/               # evolution fuel (lessons learned)
│   ├── EVOLUTION.md            # evolution engine levels
│   └── rules/                  # Claude Code: .claude/rules/*.md; Cursor: .cursor/rules/*.mdc
├── Product-Spec.md             # after /product-spec-builder
├── DEV-PLAN.md                 # after /dev-planner
├── Design-Brief.md             # optional
├── changes/                    # optional — brownfield iterations (/change-manager)
│   └── archive/
├── memory/                     # auto-created on first /dev-builder
│   ├── project-memory.md
│   ├── decisions-log.md
│   └── task-history.md
└── <project-name>/ ...         # your application code (not flat in root)

Forge does not modify your package.json unless you ask the agent to add dependencies during development.

Updating Forge in an existing project

Pull the latest ReqForge clone (or download a new release).
Re-copy the adapter directory over your project’s .claude/ / .cursor/ / .opencode/ (back up local feedback/ if you customized it).
Re-apply Windows settings.windows.json → settings.json if needed.

YOLO mode (not recommended)

Forge’s value is gating — phases, reviews, and evolution proposals ask for confirmation. YOLO auto-approves them and weakens the harness.

If enabled, gates switch to async write mode (artifacts under changes/ and .claude/.yolo-pending/). 🔴 red-boundary actions still require explicit approval.

Enable (priority: project > global > env):

Copy .forge/config.example → .forge/config, set FORGE_MODE=yolo

Or ~/.forge/config / %USERPROFILE%\.forge\config

Or env FORGE_MODE=yolo

More detail: core/docs/ (behavior boundaries, memory, sub-agents). Comparisons: openspec-comparison.md · openhuman-comparison.md.

Core Architecture

┌─────────────────────────────────────────────────────────────┐
│  Control File (CLAUDE.md / .cursor/rules/reqforge.mdc)      │ ← Orchestration Layer
│  <60 lines — dispatch map only, details in core/docs/       │
│  Project state detection, flow routing, Skill dispatch       │
├─────────────────────────────────────────────────────────────┤
│  Three-Tier Memory (Context Preservation)                    │ ← Memory Layer
│  ├─ project-memory.md  Long-term: architecture, constraints │
│  ├─ decisions-log.md   Mid-term: ADRs, technical decisions  │
│  └─ task-history.md    Short-term: recent task summaries     │
├─────────────────────────────────────────────────────────────┤
│  Sub-Agents × 10 (Context Firewall)                         │ ← Execution Layer
│  ├─ implementer        Code + compile verify + self-check   │
│  ├─ code-reviewer      Parallel dispatch + confidence aggregation   │
│  ├─ code-reviewer-*  4 specialists (design, bug, security, types)│
│  ├─ feedback-observer  Capture failures + user corrections  │
│  ├─ evolution-runner   Scan feedback accumulation           │
│  ├─ test-writer        Generate tests for tools/scripts     │
│  └─ planner            Analyze Spec, split phases, plan     │
├─────────────────────────────────────────────────────────────┤
│  Skills × 12 + Loadouts × 4 (Guides / Feedforward Control)  │ ← Guidance Layer
│  Inject methodology and standards BEFORE the agent acts     │
├─────────────────────────────────────────────────────────────┤
│  Hooks + Review Loop (Sensors / Feedback Control)           │ ← Inspection Layer
│  Check results AFTER the agent acts, deterministic          │
├─────────────────────────────────────────────────────────────┤
│  feedback/ + EVOLUTION.md (Steering Loop)                   │ ← Evolution Layer
│  Each correction improves the harness. Never repeat errors  │
└─────────────────────────────────────────────────────────────┘

Memory Layer — Three-Tier Project Memory

AI amnesia is real. Every new session, the AI forgets what your project looks like, what decisions were made, and what was built last week. Forge solves this with three tiers of version-controlled memory:

Tier	File	Retention	Content
Long-term	`memory/project-memory.md`	Permanent	Architecture, tech stack, constraints, known pitfalls, dev environment
Mid-term	`memory/decisions-log.md`	Permanent	ADR-format decision records (context → options → decision → impact)
Short-term	`memory/task-history.md`	Last 30 entries	Task summaries (date, phase, type, changed files, notes)

How it works:

Session start: AI reads all three memory files before any task — mandatory context loading
Task completion: AI appends to task-history.md (always), decisions-log.md (if a decision was made), project-memory.md (if architecture facts changed)
Initialization: memory/ directory is created automatically on first /dev-builder invocation, populated from templates using Product-Spec.md and DEV-PLAN.md info

Memory files are plain markdown committed to your project repo — shared across sessions, across team members, and across AI tools.

Behavior Boundaries — Traffic Light System

Not all AI actions should have the same level of autonomy. Forge classifies every action into three levels:

Level	Rule	Examples
🟢 Green	Execute without confirmation	Variable naming, code style, tests, bug fixes (obvious), docs, dev deps
🟡 Yellow	Confirm before proceeding	External deps, DB schema, core business logic, project config, new routes
🔴 Red	Always require explicit approval	Deleting data, production config, force push, releases, auth changes

YOLO mode: In YOLO mode, 🟢 and 🟡 actions proceed automatically. 🔴 Red actions always require confirmation, even in YOLO mode. There is no override for red boundaries.

Quick Start Mode

Don't want the full interview? Just describe your project in one sentence:

You: "A habit tracker app with AI coaching"
Forge: ⚡ Quick Spec generated! Items marked [待确认] are my best guesses.

AI infers everything — product type, target users, core features, tech stack, layout. Uncertain items default to the simpler option and are marked for your review. Switch to deep-dive mode anytime with /product-spec-builder.

Guidance Layer — 12 Skills

Each Skill is an independent methodology module — composable, extensensible, pluggable. Every skill includes a [Gotchas] section documenting common failure points and lessons learned:

Skill	Responsibility
product-spec-builder	Requirements gathering. AI interviews you through multi-round questioning to turn vague ideas into structured specs. Supports iterative mode.
change-manager	Brownfield changes. One feature per `changes/<name>/` folder: propose → apply → verify → archive (OpenSpec-aligned; see openspec-comparison).
design-brief-builder	Design language. Quantifies vague descriptions ("dark theme, minimal") into concrete direction: color palette, interaction style, information density.
design-maker	Design prototyping. Generates full page mockups through Pencil or Figma MCP.
dev-planner	Development planning. Analyzes dependency relationships, splits into phases, outputs phased development plan.
dev-builder	Implementation. Breaks work into Tasks — each Task goes through "code → review → fix → commit" loop.
bug-fixer	Four-stage systematic debugging. Don't guess, don't try blindly: gather evidence → analyze patterns → hypothesize → fix.
code-review	Parallel agent review — 4 specialists (design, bug, security, types) with confidence-scored aggregation (≥0.6 confirmed, 0.3-0.6 suspected).
release-builder	Build & deploy. Built-in privacy audit and smoke testing.
feedback-writer	Records user corrections and feedback as structured files. Feeds the evolution engine with data.
evolution-engine	Scans accumulated feedback, identifies patterns (3+ occurrences), generates proposals to upgrade rules or optimize skills.
skill-builder	Creates new Skill definitions from scratch using project templates. Triggered by evolution proposals or manual invocation.

Execution Layer — Sub-Agent Isolation (Context Firewall)

Every Task gets a fresh Sub-Agent instance. No reuse, no inherited context. The orchestrator provides complete task context (spec items, deliverables, files, project structure) but NOT previous task history. This prevents error assumptions from cascading across tasks.

Sub-Agent	Skill	Responsibility
planner	dev-planner	Architecture design + Phase splitting
implementer	dev-builder	Code + compile verify + self-check
code-reviewer	code-review	Aggregate parallel review findings
code-reviewer-design	code-review	Spec compliance, UI consistency, drift
code-reviewer-bug	code-review	Bug patterns, races, resource leaks
code-reviewer-security	code-review	OWASP Top 10, credential leaks, XSS
code-reviewer-types	code-review	Type safety, nullability, edge cases
feedback-observer	feedback-writer	Record failures + user corrections
evolution-runner	evolution-engine	Scan feedback → evolution proposals
test-writer	dev-builder	Generate Vitest tests for scripts/utilities

Inspection Layer — Hook + Review Loop

Code isn't done until it's reviewed:

Feature complete → code-reviewer parallel review
  ├─ change_complexity="simple" → quick quality check
  ├─ moderate/complex → 4 agents in parallel (design, bug, security, types)
  ├─ confirmed spec gaps → re-implement → re-review
  └─ confirmed quality issues → bug-fixer fix → re-review
  └─ pass → commit (push when ready) → Task done

Eight hook scripts fire automatically in shipped adapters (plus check-sync in the ReqForge repo only — see note below):

Hook	Trigger	Action
hallucination-gate	Before tool use	Block Write/Edit to non-existent dirs
pre-commit-check	Before commit	Block commit if compilation fails
stop-gate	Before agent stops	Block stop if code hasn't been reviewed
detect-feedback-signal	On user message	Auto-detect correction signals
mark-review-needed	After file edit	Mark changes as needing review
check-evolution	On session start	Check feedback accumulation
memory-check	After file edit	Remind to update memory if code changed
memory-guard	After tool use	Archive old task-history (>30 rows) + suggest session handoff

Note: check-sync (detects core/ vs adapters/ divergence) ships only in the ReqForge repo's core/hooks/ — not in installed adapter bundles.

Optional — auto-push: Not enabled by default. To push after every commit, add to settings.json: "PostCommit": { "run": "sh .claude/hooks/auto-push.sh" } (adjust path for Cursor/OpenCode). Script remains in hooks/auto-push.sh.

Evolution Layer — Steering Loop

A harness that doesn't learn from usage is static. Forge evolves:

Level 0: Harness Foundation — Context compaction, progressive disclosure, tool-call offloading, auto-scoring on failure — prerequisites for reliable evolution
Experience accumulation — Failures and corrections are auto-recorded with inferred Skill scores (Precision/Coverage/Efficiency/Satisfaction). Scored data is the fuel for Level 2+.
Rule graduation — Same feedback appears 3+ times → proposed as formal rule in Skill or control file
Skill optimization — Skill's feedback scores consistently low → proposed adjustment
New Skill creation — Repeated operation pattern without Skill coverage → proposed new Skill

All evolution proposals require your explicit confirmation. No automatic rule changes.

Iron Rules — Non-Negotiable Baseline

Define the problem before writing code
Plan before executing
Every step must be verifiable — "looks right" is not completion
Commit frequently — every progress point should be a rollback checkpoint
Keep docs updated — context loss is the silent killer
Trust only machine evidence (reproducible commands, test output, CI status) — not AI's verbal assurance
Codify rules — if it can be lint/test/schema/hook/CI, it MUST be; natural language alone is not enforcement
Non-compliant output must fail, not rely on humans remembering to check

Control File Philosophy

CLAUDE.md is kept under 60 lines — a dispatch map, not a manual. Detailed procedures live in each Skill's SKILL.md (loaded only when that skill is active). Reference docs (behavior boundaries, memory system, sub-agent orchestration) live in core/docs/.

Every rule in CLAUDE.md must be traceable to a specific failure or feedback. Generic best-practice rules belong in SKILL.md, not the control file. This keeps the prompt lean and every rule earns its place.

Design Priority

Design tool mockups (highest) → Design-Brief.md → Product-Spec.md (functional logic)

When design mockups exist, all UI must match the design. Conflicts are resolved in favor of the design tool.

Workflow

Describe your idea — Tell AI what you want to build; product-spec-builder interviews you to clarify (or use Quick Mode for one-sentence start)
Generate spec — Outputs Product-Spec.md
Design brief (optional) — Invoke /design-brief-builder
Design mockups (optional) — Invoke /design-maker
Development plan — Invoke /dev-planner, outputs DEV-PLAN.md
Build — Invoke /dev-builder, works through each Task in each Phase
Memory auto-update — After each Task, project memory is updated automatically
Auto-review — code-reviewer parallel agent review + confidence aggregation
Auto-fix — Failed review triggers bug-fixer automatically
Commit & push — Review passes → auto commit + push
Phase verification — Cross-Task integration check + compile + functional test
Iterate — Request changes in conversation; auto-update Spec → Plan → code → review
Brownfield feature (optional, when Spec already exists) — /change-manager propose <name> → fill changes/<name>/ → apply (dev-planner/dev-builder scoped) → verify → archive
Release — Invoke /release-builder

Repository Structure

Forge/
├── core/                      # Shared core content
│   ├── skills/                # 12 skill definitions, each in its own directory
│   ├── agents/                # 10 Sub-agent definitions
│   ├── loadouts/              # Reusable skill/agent/hook bundles
│   ├── templates/             # Document templates
│   │   └── memory/            # Three-tier memory + session handoff templates
│   ├── hooks/                 # Hook scripts (.sh/.bat/.ps1)
│   ├── docs/                  # Detailed docs (behavior boundaries, memory system, etc.)
│   └── feedback/              # Feedback templates
├── adapters/
│   ├── claude-code/           # Claude Code adapter (.claude/ + .claude/rules/)
│   ├── cursor/                # Cursor adapter (.cursor/rules/)
│   └── opencode/              # OpenCode adapter (.opencode/)
├── .forge/                    # Forge project config
│   └── config.example         #     config template (copy to config to activate)
├── .claude/                   # Forge's own control files (self-wired hooks via settings.json)
├── CLAUDE.md                  # Main control file
├── llms.txt                   # AI-searchable project summary
├── scripts/
│   ├── sync.ts                # core → adapter sync script
│   ├── install.ts             # adapter → user project install
│   ├── install.sh / install.ps1 # install wrappers
│   ├── dependency-graph.ts    # File-level import graph + blast-radius
│   ├── validate-skill.mjs     # Cross-platform SKILL.md validator (default pnpm validate-skill)
│   ├── validate-skill.sh      # Full validator + --score rubric (pnpm validate-skill:bash)
│   ├── create-skill.sh        # Scaffold new Skill directory (pnpm create-skill)
│   ├── apply-loadout.ts       # Merge loadout hooks into adapter settings
│   └── __tests__/             # Vitest unit tests
├── vitest.config.ts           # Test runner config
├── changes/                   # Change artifacts (proposal/specs/design/tasks)
│   └── archive/               # Archived implemented changes
├── EVOLUTION.md               # Evolution engine definition
├── Product-Spec.md            # Forge's own Product Spec
├── Product-Spec-CHANGELOG.md  # Spec change log
├── DEV-PLAN.md                # Forge's own development plan
├── package.json               # Forge dev dependencies
├── tsconfig.json
├── LICENSE                    # MIT license
└── README.md                  # This file

Framework Development

After editing core/, sync to adapters and run tests before committing.

Requirements: Node.js 22.x LTS, pnpm 10.x

pnpm install          # Dev dependencies (TypeScript, Vitest, etc.)
pnpm test             # Unit tests (22 cases)
pnpm build            # Compile scripts/ to dist/
pnpm sync             # Sync core/ → adapters/
pnpm validate-skill   # Validate core/skills/ (cross-platform .mjs; add --strict)
pnpm apply-loadout full claude-code  # Write loadout hooks to adapter settings
pnpm dep-graph build  # Build dependency graph → .forge/graph.json
pnpm dep-graph stats  # Print graph statistics

Command	Description
`pnpm test:watch`	Run tests in watch mode
`pnpm validate-skill:bash`	Bash validate-skill.sh (requires WSL/Git Bash); add `--score` for 32-point rubric
`pnpm create-skill <name>`	Scaffold new Skill from name (`--minimal` or `--full`)
`pnpm apply-loadout <loadout> <client>`	Merge loadout (full/web-app/cli-tool/minimal) hooks into settings; `--dry-run` to preview
`pnpm set-github-metadata`	Push description + topics from `.github/repo-metadata.json` (needs `GITHUB_TOKEN`)
`pnpm dep-graph affected [files...]`	Blast-radius: list transitively affected files (git diff if no args)
`pnpm dep-graph risk [files...]`	Risk score for a set of changes
`pnpm forge-install <client> --target <dir>`	Install adapter into a user project

Always run pnpm sync after changing core/skills, core/agents, core/hooks, etc. — otherwise the check-sync hook will warn about adapter drift.

Research & comparisons

External harnesses reviewed for positioning (not dependencies):

Project	Focus	Forge doc
OpenSpec	Spec-driven `changes/` + CLI	openspec-comparison.md — absorbed via `/change-manager`
OpenHuman	Personal AI runtime, Memory Tree, integrations	openhuman-comparison.md — optional memory backends, context rules

Model Recommendation

Forge covers the full product development pipeline, which demands more from the model than single-task setups. Opus or Sonnet-level models are recommended. Start with a small project to validate output quality and workflow smoothness before committing to a larger project.

License

MIT