council-review

agent
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 6 GitHub stars
Code Gecti
  • Code scan — Scanned 1 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

Claude Code skill: Run decisions, code, and plans through a council of 5 AI advisors with anonymous peer review. Based on Karpathy's LLM Council.

README.md

Council Review

[!IMPORTANT]
This skill now lives in ngmeyer/skills — a curated library of eight composable Claude Code skills. This standalone repo is archived and frozen at council-review V2.1; the maintained version ships in the monorepo. Install everything with npx skills@latest add ngmeyer/skills.

License: MIT
Claude Code
GitHub Stars
Platform

A Claude Code skill that runs decisions, code, plans, and PRs through a Diverse Multi-Agent Debate (DMAD) council of 5 advisors with distinct reasoning methods. Advisors collaborate, peer-review each other anonymously, and a chairman synthesizes a verdict.

This is the collaborative-DMAD pattern — empirically validated to outperform adversarial multi-agent debate (M3MADBench 2026; DMAD ICLR 2025). For single-critic adversarial stress-testing of a known artifact, use /adversarial-review instead.

Based on Andrej Karpathy's LLM Council and Ole Lehmann's Claude Code adaptation.

How It Works

You -> Question/Code/Plan
         |
   5 Advisors (parallel, independent, distinct reasoning methods)
         |
   5 Peer Reviews (anonymous, cross-review)
         |
   Chairman Synthesis (verdict + dissent preservation + verification)
         |
   Report + Recommendation + "What You Lose"

The 5 Advisors:

Advisor Angle Reasoning Method
The Contrarian What will fail? Inversion -- assume it failed, trace backward
First Principles What are we actually solving? Decomposition -- break into atomic claims, challenge each
The Expansionist What upside are we missing? Analogy -- what adjacent domain solved this differently?
The Outsider Zero context, fresh eyes Naive questioning -- flag anything requiring insider knowledge
The Executor What do you do Monday morning? Dependency graphing -- what blocks what?

Each advisor uses a different reasoning method, not just a different angle. This is the key insight from DMAD research: same-model councils need method diversity to avoid converging on the same reasoning patterns.

Install

This skill ships in the ngmeyer/skills library. Install it (and the rest) with the skills.sh installer — it auto-detects your agents and copies the skills into place:

npx skills@latest add ngmeyer/skills

Then use it in Claude Code:

/council-review Should we rewrite our auth layer in Rust?
/council-review docs/plans/v2-migration.md
/council-review https://github.com/org/repo/pull/42
/council-review --quick Is this naming convention worth changing?
/council-review --jury Should we adopt microservices?

Modes

Mode Flag Calls Best For
Full (default) none 11 High-stakes decisions
Quick --quick 4 Routine decisions, gut-checks
Adaptive --adaptive 6–26 Multi-round debate with KS-statistic early stopping (up to 94.5% cost cut on convergent questions)
Confidence --confidence 11 Confidence-weighted synthesis. Each advisor self-rates and rates peers; chairman weights by calibrated confidence rather than majority vote
Measure diversity --measure-diversity 11 Score reasoning-footprint overlap; flag theatrical consensus when advisors converge despite different methods
Jury (V2) --jury 13+ Replace the single chairman with a 3-judge jury, ideally across model families. For close calls and high-stakes verdicts where single-judge reliability isn't enough (jury-of-judges / PoLL)

Quick mode runs 3 advisors (Contrarian, Executor, Outsider) + chairman. No peer review. Fast.

Adaptive mode measures response distribution convergence between rounds via Kolmogorov-Smirnov statistic; halts when shift drops below epsilon for two consecutive rounds. Best for open questions where the council may converge quickly.

Confidence mode has each advisor end with CONFIDENCE: 1-10 and a one-line rationale; peers also rate confidence; the chairman synthesis is weighted by calibrated certainty. Surfaces low-confidence consensus as a yellow flag.

Measure-diversity mode scores reasoning-footprint overlap across responses. High overlap (>60%) is flagged as theatrical consensus — the chairman treats it as a single advisor's opinion.

Jury mode replaces the single chairman with three independent judges (ideally across model families); each synthesizes a verdict and a brief reconciliation step merges them. Use it on close calls where one judge's reliability isn't enough.

V2 note: the maintained version adds a mandatory Devil's-Advocate pass against the emerging consensus (the one configuration shown to reliably induce genuine disagreement), a sycophancy guardrail in the advisor and peer prompts, a Mediating-Assessments step in the chairman synthesis (score independent attributes before the holistic call), and an outside-view/base-rate check. Independent blind A/B: V1 3.8 → V2 4.8. See the changelog in SKILL.md.

Flags compose: /council-review --adaptive --confidence "Should we adopt GraphQL?" runs convergence-stopped, confidence-weighted deliberation.

What It Reviews

Input What Happens
A question or decision 5 advisors independently analyze tradeoffs, peer-review each other, synthesize a recommendation
An implementation plan Advisors stress-test feasibility, scope, risks, missing pieces, and execution order
A PR or code change Advisors review from security, architecture, performance, usability, and pragmatism angles
A file path Reads the file and councils its contents

Why This Works

  1. Method diversity -- Each advisor uses a distinct reasoning method (inversion, decomposition, analogy, naive questioning, dependency graphing). Same-model councils converge without this.
  2. Parallel independence -- Advisors don't see each other's responses. No groupthink.
  3. Anonymous peer review -- Responses are shuffled as A-E before review. No deference to roles.
  4. Forced tension -- The Contrarian must find flaws. The Expansionist must find upside. Coverage is structural.
  5. Disagreement classification -- Chairman distinguishes "value tensions" (both sides valid) from "error catches" (one advisor found a real flaw).
  6. Dissent preservation -- "What You Lose" section explicitly names the cost of ignoring the minority view.
  7. Chairman override -- The synthesizer can side with a minority if their reasoning is strongest.
  8. "What did ALL five miss?" -- The most valuable question in the peer review.

Output Format

## Council Verdict: [Topic]

### Where the Council Agrees
[High-confidence signals -- multiple advisors converged independently]

### Where the Council Clashes
[Value Tension] Both sides valid, depends on priorities...
[Error Catch] One advisor found a real flaw others missed...

### Blind Spots Revealed
[Things only the peer review caught]

### Recommendation
[Clear, actionable. Not "it depends." A real answer.]

### What You Lose
[Cost of following this recommendation. The strongest dissent, preserved.]

### Do This First
[One concrete next step.]
How to verify: [2-3 checks to confirm the recommendation was right]

Auto-Context

The skill automatically reads project files before framing the question:

  • README.md, CLAUDE.md / AGENTS.md
  • Recent git log
  • PR diff (if reviewing a PR)
  • Referenced files

No manual context-pasting needed. Advisors see your project, not a blank slate.

Credits

License

MIT

Yorumlar (0)

Sonuc bulunamadi