AgentForge

Forge production-grade AI agents.

28 battle-tested skills. 6 specialist personas. 5 reference checklists. One mission: make AI agents build software like senior engineers.

The Problem

AI coding agents are fast. They're also reckless.

They skip specs. They "forget" tests. They ship without review. They treat "it works on my machine" as a success criteria. In short, they build prototypes, not production software.

AgentForge fixes this.

We don't give agents vague suggestions. We give them structured, battle-tested workflows that encode how senior engineers actually build software — the same workflows that power teams at Google, Netflix, and Stripe. Every skill has steps, checkpoints, anti-rationalization defenses, and evidence-based verification. When an agent follows these, it ships code you can trust.

The Lifecycle

  DEFINE          PLAN           BUILD          VERIFY         REVIEW          SHIP           OPS
 ┌──────┐      ┌──────┐      ┌──────┐      ┌──────┐      ┌──────┐      ┌──────┐      ┌──────┐
 │ Idea │ ───▶ │ Spec │ ───▶ │ Code │ ───▶ │ Test │ ───▶ │  QA  │ ───▶ │  Go  │ ───▶ │ Run  │
 │Refine│      │  PRD │      │ Impl │      │Debug │      │ Gate │      │ Live │      │ Ops  │
 └──────┘      └──────┘      └──────┘      └──────┘      └──────┘      └──────┘      └──────┘
  /spec          /plan          /build        /test         /review       /ship

7 slash commands. 28 skills. 6 phases. Zero excuses.

What Makes This Different

	Other Prompt Packs	AgentForge
Structure	Vague advice	Step-by-step workflows with checkpoints
Verification	"Make sure it works"	Evidence-based exit criteria (tests, builds, runtime data)
Anti-cheating	None	Rationalization tables that call out excuses agents use to skip steps
Scope	Generic coding tips	Full lifecycle: spec → plan → build → verify → review → ship → ops
Quality gates	None	Built-in CI pipeline with 8 automated checks
Cross-reference	Silos	Every skill references related skills; no duplication

Commands

7 slash commands that map to the development lifecycle. Each one activates the right skills automatically.

What you're doing	Command	Key principle
Define what to build	`/spec`	Spec before code
Plan how to build it	`/plan`	Small, atomic tasks
Build incrementally	`/build`	One slice at a time
Prove it works	`/test`	Tests are proof
Review before merge	`/review`	Improve code health
Simplify the code	`/code-simplify`	Clarity over cleverness
Ship to production	`/ship`	Faster is safer

Want fewer manual steps once the spec exists? /build auto generates the plan and implements every task in a single approved pass — you approve the plan once, then it runs autonomously. It removes the human stepping between tasks, not the verification: every task is still test-driven and committed individually, and it pauses on failures or risky steps.

Skills also activate automatically based on what you're doing — designing an API triggers api-and-interface-design, building UI triggers frontend-ui-engineering, and so on.

Quick Start

Claude Code (recommended)

Marketplace install:

/plugin marketplace add borhen68/SkillEngine
/plugin install agentforge@borhen-agentforge

SSH errors? The marketplace clones repos via SSH. If you don't have SSH keys set up on GitHub, either add your SSH key or use the full HTTPS URL to force the HTTPS cloning:
/plugin marketplace add https://github.com/borhen68/SkillEngine.git
/plugin install agentforge@borhen-agentforge

Local / development:

git clone https://github.com/borhen68/SkillEngine.git
claude --plugin-dir /path/to/agentforge

Cursor

Copy any SKILL.md into .cursor/rules/, or reference the full skills/ directory. See docs/cursor-setup.md.

Antigravity CLI

Install as a native plugin for skills, subagents, and slash commands. See docs/antigravity-setup.md.

Install from the repo:

agy plugin install https://github.com/borhen68/SkillEngine.git

Install from a local clone:

git clone https://github.com/borhen68/SkillEngine.git
agy plugin install ./agentforge

Gemini CLI

Install as native skills for auto-discovery, or add to GEMINI.md for persistent context. See docs/gemini-cli-setup.md.

Install from the repo:

gemini skills install https://github.com/borhen68/SkillEngine.git --path skills

Install from a local clone:

gemini skills install ./agentforge/skills/

Windsurf

Add skill contents to your Windsurf rules configuration. See docs/windsurf-setup.md.

OpenCode

Uses agent-driven skill execution via AGENTS.md and the skill tool.

See docs/opencode-setup.md.

GitHub Copilot

Use agent definitions from agents/ as Copilot personas and skill content in .github/copilot-instructions.md. See docs/copilot-setup.md.

Kiro IDE & CLI Skills for Kiro reside under ".kiro/skills/" and can be stored under Project or Global level. Kiro also supports Agents.md. See Kiro docs at https://kiro.dev/docs/skills/ Codex / Other Agents

Skills are plain Markdown - they work with any agent that accepts system prompts or instruction files. See docs/getting-started.md.

All 28 Skills

The commands above are entry points. The pack includes 28 skills total — 24 lifecycle skills, 4 operations skills, plus the using-agentforge meta-skill. Each skill is a structured workflow with steps, verification gates, and anti-rationalization tables. You can also reference any skill directly.

Meta - Discover which skill applies

Skill	What It Does	Use When
using-agentforge	Maps incoming work to the right skill workflow and defines shared operating rules	Starting a session or deciding which skill applies

Define - Clarify what to build

Skill	What It Does	Use When
interview-me	One-question-at-a-time interview that extracts what the user actually wants instead of what they think they should want, until ~95% confidence	The ask is underspecified, or the user invokes "interview me" / "grill me"
idea-refine	Structured divergent/convergent thinking to turn vague ideas into concrete proposals	You have a rough concept that needs exploration
spec-driven-development	Write a PRD covering objectives, commands, structure, code style, testing, and boundaries before any code	Starting a new project, feature, or significant change

Plan - Break it down

Skill	What It Does	Use When
planning-and-task-breakdown	Decompose specs into small, verifiable tasks with acceptance criteria and dependency ordering	You have a spec and need implementable units

Build - Write the code

Skill	What It Does	Use When
incremental-implementation	Thin vertical slices - implement, test, verify, commit. Feature flags, safe defaults, rollback-friendly changes	Any change touching more than one file
test-driven-development	Red-Green-Refactor, test pyramid (80/15/5), test sizes, DAMP over DRY, Beyonce Rule, browser testing	Implementing logic, fixing bugs, or changing behavior
context-engineering	Feed agents the right information at the right time - rules files, context packing, MCP integrations	Starting a session, switching tasks, or when output quality drops
source-driven-development	Ground every framework decision in official documentation - verify, cite sources, flag what's unverified	You want authoritative, source-cited code for any framework or library
doubt-driven-development	Adversarial fresh-context review of every non-trivial decision in-flight - CLAIM → EXTRACT → DOUBT → RECONCILE → STOP, with optional user-authorized cross-model escalation	Stakes are high (production, security, irreversible), working in unfamiliar code, or a confident output is cheaper to verify now than to debug later
frontend-ui-engineering	Component architecture, design systems, state management, responsive design, WCAG 2.1 AA accessibility	Building or modifying user-facing interfaces
api-and-interface-design	Contract-first design, Hyrum's Law, One-Version Rule, error semantics, boundary validation	Designing APIs, module boundaries, or public interfaces

Verify - Prove it works

Skill	What It Does	Use When
browser-testing-with-devtools	Chrome DevTools MCP for live runtime data - DOM inspection, console logs, network traces, performance profiling	Building or debugging anything that runs in a browser
debugging-and-error-recovery	Five-step triage: reproduce, localize, reduce, fix, guard. Stop-the-line rule, safe fallbacks	Tests fail, builds break, or behavior is unexpected

Review - Quality gates before merge

Skill	What It Does	Use When
code-review-and-quality	Five-axis review, change sizing (~100 lines), severity labels (Nit/Optional/FYI), review speed norms, splitting strategies	Before merging any change
code-simplification	Chesterton's Fence, Rule of 500, reduce complexity while preserving exact behavior	Code works but is harder to read or maintain than it should be
security-and-hardening	OWASP Top 10 prevention, auth patterns, secrets management, dependency auditing, three-tier boundary system	Handling user input, auth, data storage, or external integrations
performance-optimization	Measure-first approach - Core Web Vitals targets, profiling workflows, bundle analysis, anti-pattern detection	Performance requirements exist or you suspect regressions

Ship - Deploy with confidence

Skill	What It Does	Use When
git-workflow-and-versioning	Trunk-based development, atomic commits, change sizing (~100 lines), the commit-as-save-point pattern	Making any code change (always)
ci-cd-and-automation	Shift Left, Faster is Safer, feature flags, quality gate pipelines, failure feedback loops	Setting up or modifying build and deploy pipelines
deprecation-and-migration	Code-as-liability mindset, compulsory vs advisory deprecation, migration patterns, zombie code removal	Removing old systems, migrating users, or sunsetting features
documentation-and-adrs	Architecture Decision Records, API docs, inline documentation standards - document the why	Making architectural decisions, changing APIs, or shipping features
observability-and-instrumentation	Structured logging, RED metrics, OpenTelemetry tracing, symptom-based alerting - instrument as you build	Adding telemetry, or shipping anything that runs in production
shipping-and-launch	Pre-launch checklists, feature flag lifecycle, staged rollouts, rollback procedures, monitoring setup	Preparing to deploy to production

Ops - Keep systems running

Skill	What It Does	Use When
chaos-engineering	Systematic fault injection and resilience testing	Designing for high availability, verifying disaster recovery
cost-optimization	Cloud spend reduction without sacrificing reliability	Bills growing unpredictably, rightsizing resources
data-engineering	Data pipelines, ETL/ELT, schema evolution, data quality	Building data pipelines, migrating schemas
ai-ops	ML model deployment, monitoring, drift detection, retraining	Deploying models, managing inference infrastructure

Agent Personas

Pre-configured specialist personas for targeted reviews:

Agent	Role	Perspective
code-reviewer	Senior Staff Engineer	Five-axis code review with "would a staff engineer approve this?" standard
test-engineer	QA Specialist	Test strategy, coverage analysis, and the Prove-It pattern
security-auditor	Security Engineer	Vulnerability detection, threat modeling, OWASP assessment
web-performance-auditor	Web Performance Engineer	Core Web Vitals audit with Quick/Deep modes and a metric-honesty rule; run it via `/webperf`
site-reliability-engineer	Site Reliability Engineer	Availability, observability, capacity planning, and incident readiness audits

Reference Checklists

Quick-reference material that skills pull in when needed:

Reference	Covers
testing-patterns.md	Test structure, naming, mocking, React/API/E2E examples, anti-patterns
security-checklist.md	Pre-commit checks, auth, input validation, headers, CORS, OWASP Top 10
performance-checklist.md	Core Web Vitals targets, frontend/backend checklists, measurement commands
accessibility-checklist.md	Keyboard nav, screen readers, visual design, ARIA, testing tools
reliability-checklist.md	Availability, observability, capacity, incident response, data integrity

How Skills Work

Every skill follows a consistent anatomy:

┌─────────────────────────────────────────────────┐
│  SKILL.md                                       │
│                                                 │
│  ┌─ Frontmatter ─────────────────────────────┐  │
│  │ name: lowercase-hyphen-name               │  │
│  │ description: Guides agents through [task].│  │
│  │              Use when…                    │  │
│  └───────────────────────────────────────────┘  │                                                                                                
│  Overview         → What this skill does        │
│  When to Use      → Triggering conditions       │
│  Process          → Step-by-step workflow       │
│  Rationalizations → Excuses + rebuttals         │
│  Red Flags        → Signs something's wrong     │
│  Verification     → Evidence requirements       │
└─────────────────────────────────────────────────┘

Key design choices:

Process, not prose. Skills are workflows agents follow, not reference docs they read. Each has steps, checkpoints, and exit criteria.
Anti-rationalization. Every skill includes a table of common excuses agents use to skip steps (e.g., "I'll add tests later") with documented counter-arguments.
Verification is non-negotiable. Every skill ends with evidence requirements - tests passing, build output, runtime data. "Seems right" is never sufficient.
Progressive disclosure. The SKILL.md is the entry point. Supporting references load only when needed, keeping token usage minimal.

Project Structure

agent-skills/
├── skills/                            # 28 skills (24 lifecycle + 4 ops + 1 meta)
│   ├── interview-me/                  #   Define
│   ├── idea-refine/                   #   Define
│   ├── spec-driven-development/       #   Define
│   ├── planning-and-task-breakdown/   #   Plan
│   ├── incremental-implementation/    #   Build
│   ├── context-engineering/           #   Build
│   ├── source-driven-development/     #   Build
│   ├── doubt-driven-development/      #   Build
│   ├── frontend-ui-engineering/       #   Build
│   ├── test-driven-development/       #   Build
│   ├── api-and-interface-design/      #   Build
│   ├── browser-testing-with-devtools/ #   Verify
│   ├── debugging-and-error-recovery/  #   Verify
│   ├── code-review-and-quality/       #   Review
│   ├── code-simplification/           #   Review
│   ├── security-and-hardening/        #   Review
│   ├── performance-optimization/      #   Review
│   ├── git-workflow-and-versioning/   #   Ship
│   ├── ci-cd-and-automation/          #   Ship
│   ├── deprecation-and-migration/     #   Ship
│   ├── documentation-and-adrs/        #   Ship
│   ├── observability-and-instrumentation/ # Ship
│   ├── shipping-and-launch/           #   Ship
│   ├── chaos-engineering/             #   Ops
│   ├── cost-optimization/             #   Ops
│   ├── data-engineering/              #   Ops
│   ├── ai-ops/                        #   Ops
│   └── using-agentforge/            #   Meta: how to use this pack
├── agents/                            # 5 specialist personas
├── references/                        # 5 supplementary checklists
├── hooks/                             # Session lifecycle hooks
├── scripts/                           # Validation & build automation
├── .claude/commands/                  # 7 slash commands (Claude Code)
├── .gemini/commands/                  # 7 slash commands (Gemini CLI)
├── commands/                          # 8 slash commands (Antigravity CLI)
├── plugin.json                        # Antigravity plugin manifest
├── package.json                       # Node.js tooling & scripts
├── Makefile                           # Local development workflows
└── docs/                              # Setup guides per tool

Development

This repository includes a comprehensive validation and quality pipeline:

# Install dependencies
npm install

# Run full validation suite
npm test

# Or use Make
make ci

Available commands:

Command	What It Does
`npm run validate`	Validate all skill files for anatomy compliance
`npm run validate:strict`	Same, but warnings block CI
`npm run quality:cross-skill`	Check cross-skill consistency and references
`npm run quality:agents`	Validate agent persona files
`npm run test:hooks`	Test session lifecycle hooks
`npm run build:packages`	Build .zip packages for distribution
`npm run stats`	Show project statistics dashboard

Quality gates enforced:

YAML frontmatter validation (name, description, max length)
Required sections: Overview, When to Use, Common Rationalizations, Red Flags, Verification
Cross-skill reference integrity (no dead links)
Internal markdown link validation
Description quality (must contain both "what" and "when" signals)
Token estimation and size warnings
Code block language specifier checks
Agent persona consistency
Lifecycle coverage completeness

Why This Exists

"I watched an AI agent ship a 'working' feature that had no tests, no error handling, and a SQL injection vulnerability. It was 'done' in 20 minutes. It would have taken 2 days to fix in production."
— Every engineering lead, 2024-2025

AI agents are incredible accelerators. They're also incredible liability generators — because they optimize for speed, not correctness. They don't know what they don't know, and they don't know that they don't know it.

AgentForge is the guardrail.

Every skill in this pack encodes hard-won judgment from production engineering:

When to write a spec (always, for anything non-trivial)
What to test (behavior, not implementation; edge cases, not just happy path)
How to review (five axes, not just "does it compile")
When to ship (when rollback is faster than fix-forward)

These aren't theoretical ideals. They're the workflows that separate teams that sleep through launches from teams that don't.

Built With

This pack draws from the best engineering cultures in the world:

Google: Hyrum's Law, Beyonce Rule, test pyramid, change sizing, trunk-based development, code as liability
Netflix: Chaos engineering, circuit breakers, graceful degradation
Stripe: API design, backward compatibility, developer experience
Amazon: Two-pizza teams, service boundaries, operational readiness

Every principle is embedded directly into the step-by-step workflows agents follow — not as footnotes, but as non-negotiable steps.

Contributing

We accept contributions that make agents more reliable, not more clever.

See docs/skill-anatomy.md for the format specification and CONTRIBUTING.md for guidelines. Every PR goes through the same quality gates the skills enforce — eat your own dog food.

License

MIT — use these skills in your projects, teams, and tools. Build something great.