English · 한국어

cladding

Unified Governance for AI-Coupled Engineering.
AI-generated code, held to the same bar as human code.

Reference implementation of the Ironclad standard. 27 detectors and a 13-stage gate verify, on every commit, that the code your AI assistant wrote still matches the spec.

Vanilla AI coding 2/8 traps caught · 25%	cladding 8/8 traps caught · 100%
_{Same spec · same model · event-sourcing store benchmark}

Why

The why fades after 3 months

The reason an AI assistant wrote code a certain way doesn't survive in the code alone.

→ spec/features/*.yaml becomes the permanent record of why.

✓ AI context survives time — six months later, the AI reconstructs intent straight from the spec (new hires get the same entry point).

AI gives a different answer each time

The same spec produces code with inconsistent patterns and structure.

→ The spec becomes the fixed reference against which every commit is checked.

✓ Enterprise-ready consistency — code style and patterns stay aligned across teams and PRs.

AI hallucination

Generated code calls APIs, functions, or options that don't exist.

→ 27 detectors and a 13-stage gate block hallucinated code on every commit.

✓ Production incidents prevented up front — CI auto-rejects hallucinated code before it merges.

What you get

How a vanilla AI coding environment and a cladding environment behave when the same situation comes up.

Situation	Vanilla AI coding	cladding
Code drifts from spec	fixed if a reviewer notices	auto-blocked on every commit
Two devs build the same feature in parallel	merge conflicts	hash-based IDs route to separate files → 0 conflicts
Who verifies AI-written code?	the AI that wrote it (risky)	a separate reviewer agent — duties split
Switching AI tools (Claude → Cursor)	reconfigure per tool	one spec → mirrored across 4 hosts
Spec authority	the AI reinterprets it each time	the sealed spec is the single source of truth

The hero's 8/8 vs 2/8 is an early benchmark (details) · larger-scale measurements are in progress.

How it works

Spec → Code → Tests runs as a single cycle — the spec captures the why, Iron Law verifies the implementation, and Drift Detection blocks anything that no longer matches.

Spec → Code → Tests as a single cycle — one feature's lifecycle

1. Spec — SSoT, single source of intent

The spec is where the why (what we're building and why) lives. A 4-tier (A/B/C/D) Single Source of Truth — intent on top, implementation below.

Tier	Role	Who edits	Authority
A — Spec	intent (what to build)	humans only	sealed · LLMs cannot edit
B — Design	design (how to build it)	humans freely	checked against A
C — Derived	implementation (code · tests)	LLMs and humans	regenerated by reading the code
D — Audit	audit log (what actually happened)	append-only	immutable

A outranks B — if code and spec disagree, the code is wrong. The spec is sealed because changing the why shakes everything downstream, so LLMs are kept out.

Sharded · multi-dev safe — spec/features/<slug>-<hash6>.yaml puts each feature in its own file with a 6-char hash ID (e.g. F-5f6b45). Two devs creating new features at the same time land in different files with different IDs — zero merge conflicts. Details: Hash-based feature IDs.

4-tier SSoT — A(Spec) → B(Design) → C(Derived) → D(Audit), A outranks B

2. Code — Iron Law (required) gate

Every change has to clear all 13 stages — typically called from CI, a git pre-push hook, or manual clad check. Each stage ships with its own unit tests.

13-stage Iron Law gate — every change must clear static(6) · test(2) · e2e(3) · evidence(2) wherever clad check runs (CI / git hook / manual)

Stage	What it checks
1.1 Type · 1.2 Lint	type errors · code style
1.3 Drift	spec ↔ code mismatches across 27 detectors
1.4 Commit · 1.5 Arch · 1.6 Secret	clean working tree · architecture invariants (forbidden imports, etc.) · leaked API keys
2.1 Unit · 2.2 Cov	unit tests pass · project coverage threshold
3.1 Smoke · 3.2 Perf · 3.3 Visual	end-to-end critical paths · performance budgets · visual regression
4.1 Audit · 4.2 UAT	every AC (acceptance criteria) has at least one piece of evidence · every `status=done` feature has at least one piece of evidence

3. Tests — 27 drift detectors

Seven categories of mismatch across spec · code · test, all caught automatically. Full catalog: src/stages/detectors/README.md.

Category	What it catches	Count	Representative detectors
spec ↔ code drift	something in the spec missing from code, or in code with nothing in the spec	6	`UNMAPPED_ARTIFACT`, `MISSING_IMPLEMENTATION`, `AC_DRIFT`
code ↔ test	code without tests · coverage falling below threshold	6	`MISSING_TESTS`, `COVERAGE_DROP`, `HARDCODED_SECRET`
spec ↔ test	an AC in the spec that no test actually verifies	4	`UNTESTED_AC`, `STATUS_DRIFT`, `STALE_EVIDENCE`
spec maintenance	spec hygiene — slug collisions, ID duplicates	4	`SLUG_CONFLICT`, `ID_COLLISION`
environment integrity	build environment and meta-file integrity	3	`HARNESS_INTEGRITY`, `META_INTEGRITY`
architecture · capability	code that breaks the architecture or capability shape declared in the spec	2	`ARCHITECTURE_FROM_SPEC`, `CAPABILITIES_FEATURE_MAPPING`
governance · policy	code that breaks an `ai_hints` policy (e.g. forbidden patterns)	2	`AI_HINTS_FORBIDDEN_PATTERN`, `ABSENCE_OF_GOVERNANCE`

4. Cycle — one feature's lifecycle

The 4 steps that wrap Spec → Code → Test into a single cycle. Merge if drift is 0, block otherwise.

One feature's lifecycle — Define → Sync → Implement → Verify, merge if drift=0 / block otherwise

Multi-Agent Workflow

cladding is a 5-agent system working in concert. Each agent has a clear role under CQS (Command-Query Separation — the agents that do are kept apart from the agents that verify), so no agent can sign off on its own work. This is the foundation that maps cleanly to compliance regimes (EU AI Act · K-AI Framework · SOX).

5 personas with CQS — orchestrator dispatches, librarian/specialist/reviewer act, observability watches metrics

Ecosystem

cladding sits at the intersection of three existing categories.

How cladding differs from the neighbors

Spec Kit · OpenSpec · Tessl · Kiro help you write a good spec. cladding goes further — it verifies on every commit that the code still matches that spec.
BMAD · ChatDev · Claude Code Agent Teams are about splitting work across multiple AI agents. cladding's 5 agents take that further by tying spec, code, and audit log into the same loop.
tdd-guard forces test-first development. That's roughly what the Unit · Coverage stages do inside cladding's 13-stage gate.
OpenHands · Cline · Aider · Goose are runners — they tell the AI to write code. cladding is the governance layer that verifies and controls what those runners produce.

cladding's edge is the combination — it folds the strongest parts of all four categories into one verification loop.

Install

Two steps: install the infrastructure, then create the project spec.

Step 1 — Install the infrastructure

Pick the route that fits how you work — both land in the same place:

(a) npm — for terminal / CI users

npm install -g cladding   # install the cladding CLI
cd <project>                # go to your project
clad setup                  # connect your AI tools (Claude / Codex / Gemini)

(b) Marketplace — for AI-tool plugin users

Open the plugin marketplace inside your AI tool (Claude Code · Codex CLI · Gemini CLI)
Search for cladding and install it
No clad setup needed — the plugin manifest wires everything

Where clad setup connects (5 host channels)

Host (when detected)	Wired location	Auto-activation
Claude Code (`~/.claude/`)	`~/.claude/plugins/cladding`	`claude plugin marketplace add` + `claude plugin install claude-code@cladding`
Codex CLI skills (`~/.agents/`)	`~/.agents/skills/cladding-*`	(auto on Codex restart)
Codex CLI MCP server (`~/.codex/`)	`[mcp_servers.cladding]` in `~/.codex/config.toml`	(TOML entry itself)
Gemini CLI (`~/.gemini/`)	`~/.gemini/extensions/cladding`	`gemini extensions link`
Cursor (`~/.cursor/`)	`mcpServers.cladding` in `~/.cursor/mcp.json`	(JSON entry itself)

clad setup invokes the per-host activation commands automatically when claude / gemini binaries are on PATH. Safe to re-run after a cladding upgrade or after installing another AI tool.

About the MCP server. Every host gets cladding wired as an MCP server — only the wire location differs. Claude Code and Gemini CLI auto-start it through the plugin/extension manifest's mcpServers field; Codex through ~/.codex/config.toml [mcp_servers.cladding]; Cursor through ~/.cursor/mcp.json. You never invoke MCP directly — no /mcp slash, no manual server-connect step. The AI in each host calls cladding's tools (clad_create_feature, etc.) in response to natural-language requests; you keep typing /cladding:init plus normal chat.

Benchmark. v0.4.0 measurements show ~60% consistency improvement and ~50% LOC reduction vs unguided AI coding on a fixed task, with 100% drift detection across a 5-iteration dev cycle. Full methodology and honest caveats (some of the consistency gain is the "more-specific-prompt" effect, not exclusively cladding) in docs/benchmarks/v0.4.0-consistency-bench.md.

Step 2 — Init (create the project spec)

Inside your project, run it once from your AI tool:

[inside your AI tool] /cladding:init "B2B payment SaaS"

This creates spec.yaml and the 4-tier docs. One-time per project.

Three init scenarios

/cladding:init takes a natural-language intent and picks the right path on its own. Same command, three starting points.

Starting point	Command	What happens
An idea, nothing else	`/cladding:init "I want to build a B2B payment SaaS"`	LLM infers the domain → spec · docs · policies generated, with 2–3 follow-up questions printed
A planning doc	`/cladding:init docs/plan.md`	cladding detects the file path, loads its contents, and uses them as the intent (absolute and relative paths both work)
Adopting into an existing project	`/cladding:init "apply cladding to this project"`	scans the existing code (≥3 source files trigger it) → observed patterns are merged with the intent

Init once, then carry on

cladding's goal is to be the infrastructure that prevents spec ↔ code drift — after init, you just keep coding. The AI references the spec while it writes, and clad check runs automatically in CI or as a pre-commit hook to block anything that drifts. No extra commands to remember.

Status

version

v0.4.0

2026-05

conformance

top tier · self-declared

tests

973/973

all pass

coverage

93.89%+

enforced

features

136

spec'd

_{100 test files · installable from the Claude Code · OpenAI Codex · Gemini CLI marketplaces.}

Road to Ironclad 1.0 — 1.0 locks when two independent implementations pass the L4 conformance fixtures (GOVERNANCE § 1). cladding is the first one.

Docs

License

MIT. LICENSE · Related: Ironclad (the standard cladding implements) · harness-boot (the seed project).