espalier-engineering
Health Gecti
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 15 GitHub stars
Code Basarisiz
- rm -rf — Recursive force deletion command in eval/ask/run.sh
- rm -rf — Recursive force deletion command in eval/grill/run.sh
- rm -rf — Recursive force deletion command in scripts/bootstrap-espalier.sh
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Train your AI coders the way you'd train a vine — discover your codebase's actual patterns, then encode them as Claude Code rules, skills, agents, hooks, and a guided pipeline so generated code lands inside your conventions on the first try, not the fifth.
Espalier Engineering
An espalier trains a fruit tree to grow flat along a wall — pruned, wired, productive, and impossible to mistake for a wild one. Espalier does the same thing for your AI coding agents: it discovers the patterns already in your codebase, then encodes them as constraints so generated code grows along your conventions on the first try, not the fifth.
/plugin marketplace add Junhanliu-dev/espalier-engineering
/plugin install espalier-engineering@espalier-engineering
/espalier-init
v0.8.0 — requirements approval gate. Both pipelines now STOP after the requirement is written and reviewed, and wait for your explicit sign-off before any code is written (Approve / Edit / Abort). Previously Stage 1 → Stage 2 → Stage 3 chained automatically, so coding started the moment the requirement doc existed. Interactive-only — a no-TTY run auto-approves so unattended pipelines never hang. No new skill, no new stage; a refresh of the existing pipeline templates.
Existing users: run
/espalier-migrate. It auto-detects your install version and applies the needed migration chain (… v0.6→v0.7→v0.8) in order. Seedocs/migrating-v0.7-to-v0.8.md.
The Problem
AI coders write plausible-looking code that doesn't fit your codebase. They invent helpers when you already have one. They split files the team would keep together. They pick a logging library the project doesn't use. They handle errors with throw when your repo standardised on Result<T> three years ago.
It's not a model intelligence problem. It's an unwritten rules problem — patterns every experienced developer on the team knows but nobody documented. The model can't read your team's Slack history.
The Solution
Espalier reads your code, extracts those rules, and writes them down as machine-enforceable constraints. The next time the AI codes, it loads your project's rules automatically and follows them. Reviewer agent (different agent, different tool set) checks against the same rules before any commit lands.
Result: rework cycles drop from 3-5 rounds to typically 1.
What It Generates
After running /espalier-init on any repo, you get a per-project espalier/ directory wired into Claude Code:
espalier/
├── rules/ # always-loaded: engineering structure, coding standards, dev process
├── skills/ # phase-loaded: coding, review, testing, requirements, grill, /espalier, /espalier-fix, /espalier-prune, /espalier-doctor, /espalier-ask
├── agents/ # delegated: harness-coder, harness-reviewer (separate tool sets)
├── wiki/ # on-demand: architecture, data models, critical paths, external services
├── hooks/ # programmatic gates: layer boundary check, pre-push gate, post-merge drift detect + backlink
├── pipeline.md # 10-stage workflow with explicit gates and rollback rules
└── changes/ # typed audit trail: feat/, fix/, refactor/, docs/ — one dir per requirement
Plus symlinks into .claude/ so Claude Code discovers everything, plus a .claude/settings.json hook entry for the quality gates.
The Two Pipelines
After init, your repo exposes two orchestrators:
/espalier <requirement> — full 10-stage pipeline
For features, refactors, and large fixes. Drives requirement → reqs review → coding (sub-agent) → code review (different sub-agent) → tests → test review → push → CI verify → deploy verify → user confirmation. Every stage has a programmatic gate; failed gates roll back; rollback counters trigger human escalation.
Stage 1 grilling (v0.6.0): before any code is written, Stage 1 interrogates the requirement — counting concrete ambiguity signals (undefined terms, unstated actors, missing failure behaviour, unscoped edge cases) and asking only as many questions as the input's vagueness warrants. The resolved answers land in requirements.md, so later stages execute a spec they can't misread. On by default; skip an invocation with --no-grill, and auto-skipped on a non-interactive (no-TTY) run.
Requirements approval gate (v0.8.0): after the requirement is written (Stage 1) and reviewed (Stage 2), the pipeline STOPS and waits for your explicit sign-off — Approve / Edit / Abort — before Stage 3 (coding) starts. A Stage 2 PASS no longer authorizes coding on its own. Interactive-only: a no-TTY run auto-approves so unattended pipelines never hang.
Requirement-prefix routing (the <slug> is date-prefixed YYYY-MM-DD-<name> so change dirs sort chronologically):
| Prefix | Type | Output dir |
|---|---|---|
feat: (or no prefix) |
feat |
espalier/changes/feat/<slug>/ |
refactor: |
refactor |
espalier/changes/refactor/<slug>/ |
docs: |
docs |
espalier/changes/docs/<slug>/ |
fix: |
use /espalier-fix instead |
— |
/espalier-fix <bug> — 5-stage bug-fix lane
For real bug fixes. Slimmer than full pipeline but adds Stage 0 auto-link discovery — the bug-fix gets linked back to the feature change that introduced it (via git blame + reverse-lookup cache + squash-merge mapping). The causing change's pipeline-state.md gains a ## Follow-up Fixes row. Six months later, when someone wonders "why does this feature have 4 fixes against it", the audit trail is right there.
Includes scope-creep escalation gates (predictive + reactive + test-scope + reviewer-flagged) — fix-lane work that exceeds its scope gets cleanly migrated to the full pipeline mid-flight without losing the causal link. Stage 1 grilling (v0.6.0) runs here too, in diagnosis mode: it pressure-tests the bug's root cause and reproduction before any fix is coded, so the lane isn't patching a symptom on an unconfirmed theory. The v0.8.0 requirements approval gate applies here as well — after the bug requirements + diagnosis are written, the lane stops for your sign-off before coding (it fires after the Stage 1 escalation gate, so only an in-lane fix reaches it).
/espalier-ask <question> — read-only Q&A (v0.7.0)
For "how does X work", "where is Y handled", "why is Z built this way", "what changed recently in X". Instead of exploring the codebase from scratch, it reads your espalier/ docs first — the wiki for how/where, the change history (requirements.md, review-record.md) for why — then verifies every doc claim against the actual code before answering, and falls back to a codebase search when the docs come up short. Every claim is sourced (doc path and/or file:line).
It is strictly read-only and not a pipeline lane — no stages, no gates, no changes/ folder. Its only writes are two notify-only sidecars: if a doc it read contradicts the code, it flags that doc stale (the same signal /espalier-doctor produces) and points you at /espalier-prune; if the docs couldn't answer at all, it logs the question to espalier/.ask-gaps.tsv (git-tracked) as evidence of what the wiki should cover next.
Keeping the guardrails in sync
The artifacts /espalier-init generates describe your codebase on init day. As the code evolves, Espalier keeps them honest (since v0.5.0): a post-merge hook flags drifted docs into a gitignored sidecar, the reviewer reports convention shifts a file diff cannot see, and the pipeline surfaces all of it in one Stage 0 pre-flight. /espalier-ask (v0.7.0) chips in opportunistically — a stale doc it trips over while answering a question gets flagged the same way, and a question the docs can't answer is logged to espalier/.ask-gaps.tsv as a wiki-gap backlog. To refresh a flagged artifact, run /espalier-prune <path>; to scan for drift no diff caught, run /espalier-doctor. Nothing is ever auto-overwritten — every refresh is gated.
Install
Marketplace (recommended)
/plugin marketplace add Junhanliu-dev/espalier-engineering
/plugin install espalier-engineering@espalier-engineering
Then in any project:
/espalier-init
Updates later:
/plugin update espalier-engineering
Manual git clone + symlink
For users not on the plugin path, or while iterating:
git clone https://github.com/Junhanliu-dev/espalier-engineering ~/repos/espalier-engineering
ln -sfn ~/repos/espalier-engineering/skills/espalier-init ~/.claude/skills/espalier-init
Restart Claude Code. The skill is now discoverable.
Update with cd ~/repos/espalier-engineering && git pull.
Project-scoped install
Drop the skill inside a single project rather than installing globally:
mkdir -p .claude/skills
ln -sfn /path/to/espalier-engineering/skills/espalier-init .claude/skills/espalier-init
How /espalier-init Works
On a fresh repo (~150 source files, medium size), expect 10-15 minutes. It's a one-time tax — every future /espalier and /espalier-fix reuses the generated structure. You earn the time back inside the first few requirements via dropped rework rounds.
- Phase 0 (front-loaded prompts) — one
AskUserQuestioncaptures squash-merge strategy, sub-agent tool scope, and doctor-scan cadence. - Phase 1 (parallel discovery) — single message fires ~10 concurrent tool calls: bash batch (tldr / manifests / git log), 6 scouts (architecture, coding patterns, testing, CI, unwritten rules, layer specs), 1 oracle (ctx7 + WebSearch in parallel), 3 wiki scouts (data models, critical paths, external services).
- Phase 2 (parallel writes) — one Write batch produces ~14 substitution files from the in-context DISCOVERY blob.
- Phase 3 (bootstrap) —
scripts/bootstrap-espalier.shruns as one bash invocation: mkdir + copies + chmod + safe symlinks + atomic.claude/settings.jsonmerge (preserves user hooks) + squash-merge decision + post-merge dispatcher install + .gitignore + 29 validation checks.
Total: ~5-7 batched turns, ~25-35 raw tool calls.
The skill never modifies its own source — it only writes content into the project it's invoked from.
Re-running on existing installs
- Fully bootstrapped repo (
espalier/.merge-hook-decisionpresent) → bootstrap auto-runs validation only (Stage 11), exit 0. Idempotent health check. - v0.3.x repo (
harness/.merge-hook-decisionpresent, noespalier/dir) → bootstrap exits 1, suggests/espalier-migrate. - v0.1.x repo (wired symlinks, no decision file) → bootstrap exits 1, suggests
/espalier-migrate. - Partial install / corrupted state →
--forceredoes all stages.
Debug flags (not used by normal flow)
bash scripts/bootstrap-espalier.sh --copy-only # Stages 1-4 only (dirs + cp templates + hooks)
bash scripts/bootstrap-espalier.sh --wire-only # Stages 5-11 only (symlinks + wiring + validation)
bash scripts/bootstrap-espalier.sh --validate-only # Stage 11 only (29 checks, no changes)
bash scripts/bootstrap-espalier.sh --dry-run # Print actions without executing
Usage Cost
/espalier-init is a heavy one-time tax. Every subsequent /espalier <req> or /espalier-fix <bug> reuses what's generated.
Why the one-time tax pays off
The cost compares once-per-repo against every-request-forever:
| Without Espalier | With Espalier |
|---|---|
| Every request: agent re-discovers conventions from scratch by reading source files into context — burns tokens on the same exploration each time | Conventions loaded once from espalier/rules/* (~3K tokens, always cached) |
| Every request: 3-5 review rounds (agent writes code that doesn't fit, reviewer flags it, agent retries) | Typically 1 review round — code lands inside conventions on first attempt |
| Every request: human catches "this doesn't match our patterns" and explains them again | Patterns enforced by harness-reviewer agent automatically; human reviews business logic only |
| Implicit/forgotten unwritten rules cause silent drift across the codebase | Rules encoded in coding-standards.md, enforced by reviewer + pre-push hook |
Math: A typical feature request without Espalier consumes 2-4× the tokens of a single /espalier invocation (because the agent re-loads the codebase into context per round, and there are 3-5 rounds). Across ~5-10 feature requests, you earn back the /espalier-init cost. Across a quarter of work, the savings compound substantially.
Side benefits beyond cost:
- Convention consistency — codebase doesn't drift toward whatever-the-model-felt-like-today
- Audit trail — every change has
pipeline-state.mdwith stages, gates, sub-agent outputs, commit SHAs - Causal links —
/espalier-fixauto-links bugs to the feature that introduced them (catches "this feature has 4 fixes against it" patterns) - Hand-off ready — a new dev (or new AI session) reads
espalier/rules/+espalier/wiki/and is productive without re-onboarding - Programmatic gates — pre-push hook blocks pushes at wrong pipeline stage; reviewer can't approve P0 violations
When NOT to run /espalier-init:
- Throwaway / prototype project that won't see 5+ feature requests
- Solo project where you don't care about consistency (just write the code yourself)
- Project too small to have meaningful conventions (single-file script)
For everything else — anything you'll iterate on for weeks+ — the tax pays back fast.
Token volume per full /espalier-init run
| Phase | Main agent | Sub-agents (10 scouts + oracle) |
|---|---|---|
| 0 (prompts) | ~0.5K in / 0.2K out | — |
| 1 (discovery, parallel) | ~10K in / 15K out | ~300-400K in / ~15K out |
| 2 (substitution writes, parallel) | ~25K in / 25K out | ~60-100K in / ~5K out |
| 3 (bootstrap bash) | ~3K in / 5K out | — |
| Total | ~40K in / 45K out | ~400-500K in / ~20K out |
Combined wire volume: ~500K-1M tokens for a medium repo (~150 source files).
Cost (USD)
| Setup | Per /espalier-init run |
|---|---|
| Opus everywhere (no cache) | $10-20 |
| Opus main + Sonnet scouts (no cache) | $4-8 |
| Opus main + Sonnet scouts + cache hits (typical) | $2-5 |
| Sonnet everywhere + cache | $1-3 |
Prompt cache (5-min TTL) saves ~50-70% on repeated context loads — kicks in across the 5-7 batched turns.
Claude Max plan budget
| Plan | Estimated full /espalier-init runs per 5-hour window |
|---|---|
| Pro ($20/mo) | <1 (may not complete one full run) |
| Max 5x ($100/mo) | 1-3 |
| Max 20x ($200/mo) | 5-15 |
Plan limits are message-window based + opaque budget for sub-agent fan-out. Verify against current Anthropic dashboard before committing.
Per-pipeline cost (after init)
| Command | Cost class |
|---|---|
/espalier <req> (full 10-stage pipeline) |
MEDIUM — pipeline stages + coding/review sub-agents |
/espalier-fix <bug> (5-stage lane) |
LIGHT-MEDIUM |
/espalier-prune <path> (refresh a stale artifact) |
LIGHT — one scout per file + a gated diff |
/espalier-doctor (periodic drift scan) |
LIGHT-MEDIUM — re-scouts a handful of artifacts |
/espalier-ask <question> (read-only Q&A) |
LIGHT — reads a few docs + verifies against code; no sub-agents |
/espalier-migrate |
LIGHT — script-driven, low LLM usage |
Re-run /espalier-init (already bootstrapped) |
TRIVIAL — validate-only, ~few K tokens |
Scaling factors
- Small repo (~50 files): -30 to -50% cost
- Large repo (500+ files): +50 to +200% cost (scouts read more)
- Oracle WebSearch-heavy stack: +20-40%
- Per-layer specs many (5+ layers detected): +20-30%
Check actuals
After a run: /cost in Claude Code, or the caveman-stats skill (reads session log directly — no AI estimation).
Philosophy
When an agent makes an error, engineer its elimination — not with prompt tweaks, but with files, rules, automated checks, and system structure.
Five guiding principles:
- Discover, don't prescribe — read the actual code, extract patterns; never impose templates from other projects.
- Quality gates must be programmatic —
ci_status == 'success' AND tests_passed == total_tests, not "check if CI passes". - Separate execution from judgment — coder agent and reviewer agent are always different invocations with different tool sets (coder has Write/Edit; reviewer has only Read/Grep/Glob/Bash).
- Context layering — rules always loaded; skills loaded per phase; agents see only their scope; wiki on demand.
- Every rule has a reason — either reflects an observed pattern, or prevents a known failure mode.
License
MIT — see LICENSE.
Status
v0.8.0 — requirements approval gate. Both pipelines now stop after the requirement is written and reviewed and wait for explicit user sign-off (Approve / Edit / Abort) before any code is written — closing the old behaviour where Stage 1 → Stage 2 → Stage 3 chained automatically and coding began the moment the requirement doc existed. The full pipeline gates between Stage 2 (review) and Stage 3 (coding); the fix lane gates after Stage 1 (after its escalation gate). Interactive-only — a no-TTY run auto-approves so unattended pipelines never hang. No new skill or stage — a refresh of the three pipeline templates. New migrate-v0.7-to-v0.8.sh.
v0.7.0 — read-only ask lane. A new espalier-ask skill answers questions about the codebase from the espalier/ wiki, rules, and change history first — classifying the question (where/how/why/what-changed), verifying every doc claim against the actual code, and falling back to a codebase search when the docs are silent. Read-only: it never edits a doc. Two notify-only byproducts — it flags a stale wiki it caught mid-answer (the /espalier-doctor signal) and logs an unanswerable question to a git-tracked espalier/.ask-gaps.tsv wiki-gap backlog. Purely additive — no pipeline change. New migrate-v0.6-to-v0.7.sh and an eval/ask/ harness.
v0.6.0 — Stage 1 grilling. A new espalier-grill skill interrogates the Stage 1 input before any code is written — adaptive sequential questioning that scores ambiguity signals and resolves them into requirements.md, in spec mode for /espalier and diagnosis mode for /espalier-fix. On by default, --no-grill to skip, auto-skipped without a TTY. Change folders are now date-prefixed (YYYY-MM-DD-<slug>) so they sort chronologically. Additive and interactive-only — non-breaking. New migrate-v0.5-to-v0.6.sh and an eval/grill/ harness.
v0.5.0 — doc-drift detection. Generated rules/wiki/specs/hooks are kept in sync with the codebase as it evolves: a post-merge drift detector writing a gitignored sidecar, reviewer convention-drift capture, a cross-PR convention index, the gated /espalier-prune refresh skill, a periodic /espalier-doctor re-scout, a consolidated Stage 0 pre-flight, a notify-only Stage 8.5, and validation checks 25–28. Non-breaking — drift detection is additive. New migrate-v0.4-to-v0.5.sh and a three-stage /espalier-migrate chain.
v0.4.0 — the rebrand. Plugin renamed harness-engineering → espalier-engineering. Target-project directory harness/ → espalier/. Slash commands collapsed: full pipeline is now /espalier (was /harness-run), bug-fix lane is /espalier-fix (was /harness-fix), init is /espalier-init (was /harness-engineering), migration is /espalier-migrate. Sub-agent identifiers harness-coder / harness-reviewer kept for in-flight stability. New /espalier-migrate shim detects v0.1.x or v0.3.x installs and dispatches to the right migration script. GitHub repo renamed harness-engineering → espalier-engineering (GitHub maintains redirects). README now documents per-run token cost + Claude Max plan budget.
v0.3.0 — init speedup. /harness-engineering first run on a fresh repo ~30-50% faster (20+ min → 10-15 min on a medium codebase). Parallel discovery scouts, parallel substitution writes, bundled bootstrap script collapsing Phase 8 (Hooks) + Phase 10 (Wiring) + Phase 11 (Validation) into one idempotent invocation. Plus fixes for several latent v0.2.x bugs.
v0.2.0 — bug-fix lane (/harness-fix), typed harness/changes/{type}/{slug}/ layout, causal back-links between fixes and the changes that introduced them, escalation paths (including late-stage), squash-merge resilience with optional post-merge hook, and self-healing reverse-lookup cache.
See CHANGELOG.md for full release notes, docs/migrating-v0.7-to-v0.8.md for the v0.8 requirements approval gate upgrade, docs/migrating-v0.6-to-v0.7.md for the v0.7 read-only ask lane, docs/migrating-v0.5-to-v0.6.md for the v0.6 Stage 1 grill upgrade, docs/migrating-v0.4-to-v0.5.md for the v0.5 doc-drift upgrade, docs/migrating-v0.3-to-v0.4.md for the v0.4 rebrand migration, and docs/migrating-v0.1-to-v0.2.md for the older typed-changes migration.
Schema and templates may still change. Please file issues with the Espalier output you'd expect to see for your stack.
Optional dependency
The /espalier-fix slug derivation transliterates Unicode bug descriptions via the Python unidecode library. If unavailable, falls back to ASCII-strip (loses CJK/RTL meaning but never crashes). Install with pip install unidecode or uv pip install unidecode.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi