forge
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Warn
- fs module — File system access in hooks/progress-tracker.js
- fs module — File system access in hooks/stop-hook.sh
- fs module — File system access in hooks/tool-cache-store.js
- fs module — File system access in hooks/tool-cache.js
- fs module — File system access in scripts/bench-caveman-agents.cjs
Permissions Pass
- Permissions — No dangerous permissions requested
This tool is an automated "brainstorm-to-commit" pipeline for Claude Code. It takes a single-line feature idea, generates a specification, plans tasks, and executes them in parallel git worktrees using Test-Driven Development (TDD) to produce reviewed and committed code.
Security Assessment
Risk: Medium. The tool operates heavily on the local file system to manage git worktrees, write specs, and track progress, which is explicitly flagged in several hook scripts. Its core function is to autonomously execute code generation and testing without human intervention. While no hardcoded secrets or explicitly dangerous permissions were found, any tool designed to run autonomously with full shell access via Claude Code carries inherent risks. Unexpected behavior in the automated pipeline could modify or delete local files and codebases.
Quality Assessment
The project is licensed under the permissive MIT license and is under active development, with its most recent push occurring today. However, community visibility and trust are currently very low, as evidenced by only 5 GitHub stars. Because of this low adoption, the codebase has not been widely peer-reviewed by the broader developer community.
Verdict
Use with caution. While the project is active and cleanly licensed, the combination of low community visibility and high-level autonomous file system access means you should thoroughly inspect the scripts before letting it modify your local code repositories.
Turn a one-line idea into a branch with tested, reviewed, committed code. The brainstorm-to-commit pipeline for Claude Code.
One idea in. Tested, reviewed, committed code out.
Watch the architecture video · Read the docs
You start a feature in Claude Code. You write the prompt. It writes the code. You review it. You re-prompt. It tries again. It loses context. You re-explain. You watch the "context: 87%" warning crawl up. You restart. You re-explain again. You're three hours in, you have half a feature, and you're the one keeping the whole thing from falling apart.
You are the project manager. You are the state machine. You are the glue.
Forge replaces you as the glue. You describe what you want in one line. Forge writes the spec, plans the tasks, runs them in parallel git worktrees with TDD, reviews the code, verifies it against the acceptance criteria, and commits atomically. You read the diffs in the morning.
Install
Requires Claude Code v1.0.33+. Zero npm install, zero build step, zero dependencies.
claude plugin marketplace add LucasDuys/forge
claude plugin install forge@forge-marketplace
Three commands to ship a feature
/forge brainstorm "add rate limiting to /api/search with per-user quotas"
/forge plan
/forge execute --autonomy full
Then walk away.
What you actually see
$ /forge brainstorm "add rate limiting to /api/search with per-user quotas"
[forge-speccer] generating spec from idea...
spec written: .forge/specs/spec-rate-limiting.md
R001 per-user quotas, configurable per tier (free / pro / enterprise)
R002 sliding window counters (1 minute, 1 hour, 1 day)
R003 429 response with Retry-After header
R004 bypass for admin tokens
R005 redis-backed counters with atomic increment
R006 structured logs for rate-limit events
R007 integration test against /api/search
$ /forge plan
[forge-planner] decomposing into task DAG...
8 tasks across 3 tiers (depth: standard)
T001 add redis client + connection pool [haiku, quick]
T002 implement sliding window counter [sonnet, standard]
T003 build rate-limit middleware [sonnet, standard]
T004 wire middleware to /api/search route [haiku, quick]
T005 add 429 response with Retry-After [haiku, quick]
T006 admin token bypass [haiku, quick]
T007 structured logging [haiku, quick]
T008 integration test [sonnet, standard]
deps: T001 T002 T003 T004 T005 T006 T007
$ /forge execute --autonomy full
[14:02:11Z] lock acquired (pid 18432)
[14:02:11Z] T001 worktree created -> .forge/worktrees/T001/
[14:02:11Z] T001 executing haiku budget 5000
[14:02:48Z] T001 PASS 4 lines 1 commit budget 1820/5000
[14:02:48Z] T002 executing sonnet budget 15000
[14:02:48Z] T003 executing sonnet budget 15000 (parallel, no file conflict)
[14:04:33Z] T002 PASS 37 lines 5 tests budget 11240/15000
[14:06:01Z] T003 PASS 62 lines 8 tests budget 13880/15000
[14:06:01Z] T004 T005 T006 T007 dispatched in parallel
[14:08:27Z] tier 2 complete squash-merged 6 worktrees
[14:08:27Z] T008 executing sonnet budget 15000
[14:14:12Z] T008 PASS 44 lines 12 tests budget 12300/15000
[14:14:12Z] forge-verifier: existence > substantive > wired > runtime
[14:14:18Z] verifier PASS all 7 requirements satisfied
[14:14:18Z] <promise>FORGE_COMPLETE</promise>
8 tasks. 12 minutes. 218 lines. 9 commits squash-merged to main.
session budget: 47200 / 500000 used. lock released.
You read the diffs. You merge the branch. You move on.
Why it works
- Native Claude Code plugin. Lives in your existing session. No separate harness, no TUI to learn, no API key to manage. (architecture)
- Hard token budgets. Per-task and per-session ceilings, enforced as hard stops, not warnings. No more silent overruns at 3am. (budgets)
- Git worktree isolation. Every task runs in its own worktree. Failed tasks get discarded. Successful ones squash-merge with atomic commit messages. Your main branch only ever sees green code. (worktrees)
- Crash recovery that actually works. Lock file with heartbeat, per-step checkpoints, forensic resume from git log. If your machine reboots mid-feature,
/forge resumepicks up exactly where it died. (recovery) - Headless mode for CI and cron. Proper exit codes, JSON state queries in under 5ms, zero interactive prompts. (headless)
- Goal-backward verification. The verifier checks the spec, not the tasks. Existence > substantive > wired > runtime. Catches stubs, dead code, and "looks done but isn't" before they ship. (verification)
- Backpropagation. When a bug surfaces in production,
/forge backproptraces it back to the spec gap that allowed it and writes the regression test that would have caught it. (backprop)
Receipts
- 100 tests, 0 dependencies. Full suite runs in 2.4 seconds. Pure
node:assert. - Headless state query: under 5ms. Zero LLM calls. Drop it in a Prometheus exporter.
- Caveman compression: 26.8% reduction on internal artifacts. (benchmark)
- Lock heartbeat survives crashes, reboots, OOMs, and context resets. Five minute stale threshold, never auto-deletes user work.
- Worktree isolation: failed tasks never touch your main branch. Successful ones land as one squashed commit with a structured message.
- Seven specialized agents. Speccer, planner, researcher, executor, reviewer, verifier, complexity scorer. Each routed to the cheapest model that can handle the job. (agents)
- Seven circuit breakers. Test failures, debug exhaustion, review iterations, no-progress detection, token ceilings. Nothing runs forever. (circuit breakers)
How it compares
Forge is one of three tools in this space alongside Ralph Loop and GSD-2. They overlap but optimize for different things:
- Pick Forge if you want autonomous execution that lives inside your existing Claude Code session, with hard cost controls, adaptive depth, and crash recovery.
- Pick GSD-2 if you want a more battle-tested standalone TUI harness with more engineering hours behind it.
- Pick Ralph Loop if you have a tightly-scoped greenfield task with binary verification and want the absolute minimum infrastructure.
Full honest comparison with all the trade-offs: docs/comparison.md.
Documentation
- Architecture — three-tiered loop, self-prompting engine, execution flow
- Commands — every slash command and flag
- Configuration —
.forge/config.jsonreference - Token budgets — per-task and session ceilings
- Worktree isolation — how each task gets its own branch
- Crash recovery — forensic resume from checkpoints
- Headless mode — CI/cron usage and JSON schema
- Specialized agents — the seven roles and their model routing
- Verification & circuit breakers — goal-backward verification, the seven safety nets
- Backpropagation — bugs to spec gaps
- Caveman optimization — internal token compression
- Testing — running the 100-test suite
- Comparison — Forge vs Ralph Loop vs GSD-2
Credits
- Caveman skill adapted from JuliusBrussee/caveman (MIT)
- Ralph Loop pattern by Geoffrey Huntley — Forge's self-prompting loop is a smarter-state-machine variant
- Spec-driven development concepts from GSD v1 by TÂCHES
- Claude Code plugin system by Anthropic — Forge is a native extension, not a wrapper
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
node scripts/run-tests.cjs - Open a pull request
See CONTRIBUTING.md.
License
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found