council-review
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 6 GitHub stars
Code Gecti
- Code scan — Scanned 1 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Claude Code skill: Run decisions, code, and plans through a council of 5 AI advisors with anonymous peer review. Based on Karpathy's LLM Council.
Council Review
[!IMPORTANT]
This skill now lives inngmeyer/skills— a curated library of eight composable Claude Code skills. This standalone repo is archived and frozen at council-review V2.1; the maintained version ships in the monorepo. Install everything withnpx skills@latest add ngmeyer/skills.
A Claude Code skill that runs decisions, code, plans, and PRs through a Diverse Multi-Agent Debate (DMAD) council of 5 advisors with distinct reasoning methods. Advisors collaborate, peer-review each other anonymously, and a chairman synthesizes a verdict.
This is the collaborative-DMAD pattern — empirically validated to outperform adversarial multi-agent debate (M3MADBench 2026; DMAD ICLR 2025). For single-critic adversarial stress-testing of a known artifact, use /adversarial-review instead.
Based on Andrej Karpathy's LLM Council and Ole Lehmann's Claude Code adaptation.
How It Works
You -> Question/Code/Plan
|
5 Advisors (parallel, independent, distinct reasoning methods)
|
5 Peer Reviews (anonymous, cross-review)
|
Chairman Synthesis (verdict + dissent preservation + verification)
|
Report + Recommendation + "What You Lose"
The 5 Advisors:
| Advisor | Angle | Reasoning Method |
|---|---|---|
| The Contrarian | What will fail? | Inversion -- assume it failed, trace backward |
| First Principles | What are we actually solving? | Decomposition -- break into atomic claims, challenge each |
| The Expansionist | What upside are we missing? | Analogy -- what adjacent domain solved this differently? |
| The Outsider | Zero context, fresh eyes | Naive questioning -- flag anything requiring insider knowledge |
| The Executor | What do you do Monday morning? | Dependency graphing -- what blocks what? |
Each advisor uses a different reasoning method, not just a different angle. This is the key insight from DMAD research: same-model councils need method diversity to avoid converging on the same reasoning patterns.
Install
This skill ships in the ngmeyer/skills library. Install it (and the rest) with the skills.sh installer — it auto-detects your agents and copies the skills into place:
npx skills@latest add ngmeyer/skills
Then use it in Claude Code:
/council-review Should we rewrite our auth layer in Rust?
/council-review docs/plans/v2-migration.md
/council-review https://github.com/org/repo/pull/42
/council-review --quick Is this naming convention worth changing?
/council-review --jury Should we adopt microservices?
Modes
| Mode | Flag | Calls | Best For |
|---|---|---|---|
| Full (default) | none | 11 | High-stakes decisions |
| Quick | --quick |
4 | Routine decisions, gut-checks |
| Adaptive | --adaptive |
6–26 | Multi-round debate with KS-statistic early stopping (up to 94.5% cost cut on convergent questions) |
| Confidence | --confidence |
11 | Confidence-weighted synthesis. Each advisor self-rates and rates peers; chairman weights by calibrated confidence rather than majority vote |
| Measure diversity | --measure-diversity |
11 | Score reasoning-footprint overlap; flag theatrical consensus when advisors converge despite different methods |
| Jury (V2) | --jury |
13+ | Replace the single chairman with a 3-judge jury, ideally across model families. For close calls and high-stakes verdicts where single-judge reliability isn't enough (jury-of-judges / PoLL) |
Quick mode runs 3 advisors (Contrarian, Executor, Outsider) + chairman. No peer review. Fast.
Adaptive mode measures response distribution convergence between rounds via Kolmogorov-Smirnov statistic; halts when shift drops below epsilon for two consecutive rounds. Best for open questions where the council may converge quickly.
Confidence mode has each advisor end with CONFIDENCE: 1-10 and a one-line rationale; peers also rate confidence; the chairman synthesis is weighted by calibrated certainty. Surfaces low-confidence consensus as a yellow flag.
Measure-diversity mode scores reasoning-footprint overlap across responses. High overlap (>60%) is flagged as theatrical consensus — the chairman treats it as a single advisor's opinion.
Jury mode replaces the single chairman with three independent judges (ideally across model families); each synthesizes a verdict and a brief reconciliation step merges them. Use it on close calls where one judge's reliability isn't enough.
V2 note: the maintained version adds a mandatory Devil's-Advocate pass against the emerging consensus (the one configuration shown to reliably induce genuine disagreement), a sycophancy guardrail in the advisor and peer prompts, a Mediating-Assessments step in the chairman synthesis (score independent attributes before the holistic call), and an outside-view/base-rate check. Independent blind A/B: V1 3.8 → V2 4.8. See the changelog in
SKILL.md.
Flags compose: /council-review --adaptive --confidence "Should we adopt GraphQL?" runs convergence-stopped, confidence-weighted deliberation.
What It Reviews
| Input | What Happens |
|---|---|
| A question or decision | 5 advisors independently analyze tradeoffs, peer-review each other, synthesize a recommendation |
| An implementation plan | Advisors stress-test feasibility, scope, risks, missing pieces, and execution order |
| A PR or code change | Advisors review from security, architecture, performance, usability, and pragmatism angles |
| A file path | Reads the file and councils its contents |
Why This Works
- Method diversity -- Each advisor uses a distinct reasoning method (inversion, decomposition, analogy, naive questioning, dependency graphing). Same-model councils converge without this.
- Parallel independence -- Advisors don't see each other's responses. No groupthink.
- Anonymous peer review -- Responses are shuffled as A-E before review. No deference to roles.
- Forced tension -- The Contrarian must find flaws. The Expansionist must find upside. Coverage is structural.
- Disagreement classification -- Chairman distinguishes "value tensions" (both sides valid) from "error catches" (one advisor found a real flaw).
- Dissent preservation -- "What You Lose" section explicitly names the cost of ignoring the minority view.
- Chairman override -- The synthesizer can side with a minority if their reasoning is strongest.
- "What did ALL five miss?" -- The most valuable question in the peer review.
Output Format
## Council Verdict: [Topic]
### Where the Council Agrees
[High-confidence signals -- multiple advisors converged independently]
### Where the Council Clashes
[Value Tension] Both sides valid, depends on priorities...
[Error Catch] One advisor found a real flaw others missed...
### Blind Spots Revealed
[Things only the peer review caught]
### Recommendation
[Clear, actionable. Not "it depends." A real answer.]
### What You Lose
[Cost of following this recommendation. The strongest dissent, preserved.]
### Do This First
[One concrete next step.]
How to verify: [2-3 checks to confirm the recommendation was right]
Auto-Context
The skill automatically reads project files before framing the question:
README.md,CLAUDE.md/AGENTS.md- Recent git log
- PR diff (if reviewing a PR)
- Referenced files
No manual context-pasting needed. Advisors see your project, not a blank slate.
Credits
- Original concept: Andrej Karpathy (LLM Council)
- Claude Code adaptation: Ole Lehmann (@itsolelehmann)
- Reasoning method diversity: DMAD (ICLR 2025)
- Collaborative > adversarial empirical evidence: M3MADBench (arXiv 2601.02854, 2026)
- Confidence-modulated debate protocol: Demystifying MAD (arXiv 2601.19921, 2026)
- KS-statistic adaptive stopping: rachittshah/llmcouncil (S2 MAD)
- Diversity-footprint verification: Counsel (Same model, same blind spots)
- Skill by: Neal Meyer
License
MIT
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi