development-skills
Makes Agents produce staff-engineer-quality code
A Claude Code plugin that turns your AI agent into a disciplined software engineer.
Installation · How It Works · 19 Skills · Philosophy · Blog Post
Installation
In Claude Code, register the marketplace first:
/plugin marketplace add reidemeister94/development-skills
Then install the plugin:
/plugin install development-skills@development-skills
The plugin activates automatically on any coding task. No configuration needed.
Verify Installation
Start a new session and give Claude a development task. The plugin should activate automatically — you'll see it follow a structured workflow (research, plan, implement, verify, review) instead of jumping straight to code.
You can also invoke skills directly:
/brainstorming — Evaluate approaches before committing to one
/debugging — Systematic root-cause analysis
/create-test — Design tests that find bugs, not just exist
/distill — Compress verbose text while preserving facts
/commit — Conventional commit from staged changes
Updating
/plugin update development-skills@development-skills
Optional: Regression Testing
The skill-creator plugin is required for running regression tests (/eval-regression):
/plugin install skill-creator@claude-plugins-official
Why This Exists
AI agents are fast but undisciplined. 67% of developers spend more time debugging AI-generated code, which contains 1.7x more major issues and 2.74x more security vulnerabilities. Agents delete unit tests to make them pass. They generate code at 140-200 lines/min while humans comprehend at 20-40 — creating cognitive debt.
development-skills makes the good-day workflow the only workflow — research before planning, plan before coding, test before shipping, review before merging. Every time, without reminding.
As Anthropic's 2026 Agentic Coding Trends Report concluded: success comes from treating agentic development as a "workflow design problem, not a tool adoption problem."
How It Works
When you give Claude Code a development task with this plugin installed, it doesn't just start writing code. Instead, it follows a mandatory gated workflow:
Each phase is a gate — the agent cannot proceed until the gate conditions are met. No skipping. No combining. No "this is trivial, I'll just code it."
The 7 Phases
| # | Phase | What Happens | Gate |
|---|---|---|---|
| 1 | Research | Explore the codebase and gather context | "RESEARCH COMPLETE" |
| 2 | Plan | Write a plan to disk, enter plan mode | User approves the plan |
| 3 | Chronicle | Document the WHY — business context, requirements, decisions | "CHRONICLE INITIATED" |
| 4 | Implement | TDD cycles with dedicated implementer subagent | "SOLUTION COMPLETE" |
| 5 | Verify | Dedicated test-verifier runs the full test suite | Evidence of passing |
| 6 | Staff Review | Two-stage code review: spec compliance, then quality | "APPROVED" |
| 7 | Finalize | Update docs, chronicle, integration options | "WORKFLOW COMPLETE" |
Small tasks get a fast track. If a change touches 3 files or fewer with a single obvious approach, the plugin collapses into lightweight mode — same quality checks, no ceremony.
Key Features
Brainstorming Guard — Before coding, evaluates scope, reversibility, and approach clarity. If anything is ambiguous, spawns an isolated analysis agent. The default is to analyze; burden of proof is on skipping. Anti-rationalization tables counter the model's tendency to justify shortcuts. Without this guard, the agent skips analysis ~40% of the time on tasks that need it.
Subagent Architecture — Three specialized agents: Staff Reviewer (Opus, two-stage code review), Implementer (Sonnet, TDD execution), Test Verifier (Sonnet, structured pass/fail). Mirrors Anthropic's effective sub-agent patterns. Giving agents a way to verify their own work improves quality 2-3x.
Observation Masking — Verbose tool output (80%+ of context tokens) stays on disk. Implementation logs, test output, and review criteria live in files — your main conversation stays clean for decision-making.
Filesystem Persistence — Plans, chronicles, and workflow state survive context compaction. The agent resumes from any phase, even after a full context clear. Projects with persistent memory show 40% fewer errors and 55% faster completion.
Smart Parallel Implementation — For 4+ independent tasks, analyzes file-touch maps and spawns parallel agents in git worktrees — but only when proven safe via dependency analysis. Naive parallelization produced 100% unusable code; single-agent is the safe default.
Chronicles — The missing documentation layer. Code says WHAT, plans say HOW, chronicles capture WHY. Business context, decisions, and failed approaches — timestamped and browseable.
19 Skills, 5 Languages
Development Skills
| Skill | Trigger | What It Does |
|---|---|---|
core-dev |
Auto (any coding task) | Workflow router — detects language, enforces brainstorming guard, dispatches |
brainstorming |
/brainstorming |
Critical evaluation with isolated analysis agent. Two modes: full analysis, focused evaluation |
python-dev |
/python-dev |
Python patterns — Pydantic, FastAPI, asyncpg, pytest |
java-dev |
/java-dev |
Java patterns — Records, Streams, Spring Boot, JPA |
typescript-dev |
/typescript-dev |
TypeScript patterns — Zod, Express, Fastify, vitest (backend/CLI only) |
frontend-dev |
/frontend-dev |
Auto-detects React, Next.js, Raycast, Vite. Loads framework-specific patterns |
swift-dev |
/swift-dev |
Swift patterns — SwiftUI, UIKit, Vapor, SPM |
debugging |
/debugging |
Systematic root-cause debugging: investigate, analyze, hypothesize, fix |
Specialized Skills
| Skill | Trigger | What It Does |
|---|---|---|
create-test |
/create-test |
Risk-scored test design. Explorer mode audits your codebase for dangerous untested code; targeted mode generates boundary, property-based, and invariant tests with strong assertions |
distill |
/distill |
Hybrid semantic text compression: deterministic regex pre-processing + LLM compression + deterministic post-verification. Multilingual noise removal (EN/IT/FR/ES/DE). Measures entropy via gzip |
commit |
/commit |
Conventional commits from staged changes |
chronicles |
Auto | Project snapshots capturing the WHY behind changes |
align-docs |
/align-docs |
Align documentation with current project state |
eval-regression |
/eval-regression |
Pre-commit regression testing — compares current version against last committed version |
update-precommit |
/update-precommit |
Update .pre-commit-config.yaml hooks to latest versions |
update-reqs |
/update-reqs |
Update requirements.in with latest PyPI versions |
update-reqs-dev |
/update-reqs-dev |
Update requirements-dev.in with latest PyPI versions |
resolve-merge |
/resolve-merge |
Systematic merge conflict resolution with numbered docs renumbering support |
best-practices |
/best-practices <topic> |
Deep web research from authoritative sources (engineering blogs, official docs, books, GitHub projects >5k stars). Produces structured state-of-the-art report with trade-offs, decision frameworks, anti-patterns, and cited sources |
Auto-Format on Save
A PostToolUse hook automatically formats files when Claude edits them:
| Language | Formatter | Fallback |
|---|---|---|
| Python | ruff (30x faster than Black) | — |
| JS/TS/CSS/JSON | biome (Rust-based, 7-100x faster) | prettier |
| Java | google-java-format | — |
| Kotlin | ktfmt | ktlint |
| Swift | swift-format | swiftformat |
| HTML/YAML | prettier | — |
Design Philosophy
Iron Rules — enforced at every phase, not suggested:
- No positive claims without fresh verification evidence
- Red/Green TDD — every implementation starts with a failing test (Kent Beck agrees)
- Comment the WHY, not the WHAT
- No commits without explicit user request
- Every gate must be explicitly passed
Model Behavior — maximum honesty (zero accommodation), always-on critical thinking, calibrated criticism (concrete and evidence-based), planning as 90% of the work, data-validated decisions, and persistent knowledge on disk.
Architecture
skills/ 19 skills (core-dev, 5 languages, brainstorming, debugging, testing, utilities)
agents/ 3 subagents (implementer, staff-reviewer, test-verifier)
hooks/ Auto-format on Edit/Write (multi-language) + session context
shared/ Workflow engine with just-in-time phase loading
commands/ Feedback production/ingestion
Context is loaded progressively following Anthropic's just-in-time pattern: workflow.md always loaded (~120 lines), phase instructions loaded per-phase (~300 words each), language patterns loaded on-demand.
Context Engineering
Implements patterns from Anthropic's Context Engineering guide and validated by Manus across millions of production users:
- Progressive disclosure — phase instructions loaded just-in-time, not all at once
- Observation masking — verbose output on disk, condensed summaries in conversation
- Filesystem as extended context — plans, chronicles, workflow state, implementation logs
- Clean subagent windows — each agent gets only the context it needs
- Anti-rationalization tables — keep the model honest under pressure
Built from 60,000+ lines of production Python — FastAPI backends, legacy databases, shared environments. Every feature exists because its absence caused a real problem.
Further Reading
- How I Taught Agents to Follow a Process, Not Just Write Code — the full story behind this plugin
- Effective Context Engineering for AI Agents — Anthropic's guide to the patterns we implement
- Building Claude Code with Boris Cherny — how the creator thinks about agent workflows
- TDD, AI Agents and Coding with Kent Beck — why testing matters more with AI
- Agentic Engineering — Addy Osmani on structured workflows
- Context Engineering: Lessons from Manus — production-validated patterns
Regression Testing
30 evals, 98 assertions across 11 behavioral dimensions — a test suite for agent behavior. Powered by Anthropic's skill-creator plugin.
/eval-regression
Covers: brainstorming guard (7), smart isolation (6), anti-rationalization (4), performance review (3), workflow phases (3), implementer discipline (2), language detection, chronicle quality, turn boundaries, project directives, and AskUserQuestion avoidance. Each eval snapshots the committed version as baseline, runs the modified version, and produces a clear verdict: SAFE TO COMMIT or REGRESSIONS FOUND.
Contributing
Contributions welcome — especially new language skills (Rust, Go, Kotlin, Ruby, C#). See CONTRIBUTING.md.
Golden rule: no PR without a passing /eval-regression benchmark. Zero regressions = merge. Open an issue first to discuss.
License
MIT
If this plugin makes your AI agent more disciplined, consider giving it a star.
It helps others discover the project and motivates continued development.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi