Agent Brain

Persistent decision memory for AI code agent teams. Agents learn from mistakes, coordinate across sessions, and never repeat the same error twice — and read your codebase at ~20% of the usual token cost (tokenizer-measured: 81% saved).

Works with any MCP (Model Context Protocol)-compatible agent: Claude Code, Cursor, Windsurf, Cline, Continue, etc. Agent templates (.md files) are Claude Code specific — the MCP server itself is universal.

What This Does · Features
Quick Start — install in 2 minutes
How To Use It — the agent loop, a worked example, what you get
Architecture — what lives where, performance & internals
MCP Tools (17) — the agent-facing API
Agent Team — bundled role templates
Model Routing — right model per phase, two-strikes escalation, plan handoff
Brain Protocol — the enforced decision loop
SAN Protocol — code compression: is it worth it, measuring savings (token_savings)
SAN Setup — turning SAN on, model choice, other platforms
Adaptive Warnings · Office Dashboard — live pixel-art team view
Verification · Requirements · Configuration · Customization

What This Does

AI coding agents start fresh every session: no memory of past decisions, no learning from rejections, no cross-agent knowledge sharing — and they burn tokens re-reading the same source files task after task. Agent Brain fixes both:

Memory — decisions, outcomes, and review feedback persist across sessions and agents:

Agent    → pre_check()    → "WARNING: similar approach was rejected last week"
Agent    → log_decision() → records what you decided and why
Agent    → does work      → PR created
Reviewer → log_outcome()  → "rejected: violates DIP (dependency inversion)"
Next time, any agent → pre_check() → sees that rejection → avoids the mistake

Cheap code reading — the optional SAN protocol compresses source files to ~17-27% of their original tokens (81% saved, tokenizer-measured), and token_savings shows you exactly how much it saved, per session, in numbers and %.

Features

Feature	What it does
Decision Memory	Log decisions, outcomes, feedback. Persists across sessions.
Pre-Check Warnings	Before starting work, see past failures in the same area.
Fuzzy Matching	"Rate limiting on signup" finds "rate limiting on login" rejection.
Code Bridge	Link decisions to code symbols: "Show me all decisions that touched AuthService." (Richer with the optional code-review-graph MCP server; works standalone too.)
Agent Scorecards	Acceptance rate, trends, top rejection categories per agent.
Adaptive Warnings	Agents with high rejection rates get stricter pre-check warnings.
Team Dashboard	All agents at a glance — for project managers.
SAN Protocol	Compress code to ~20% of original tokens (81% saved, measured). Full codebase fits in context.
Token Savings Tracker	`token_savings` reports tokens saved this session / today / all-time, with %.
Enforcement Hook	Code edits are blocked until the agent logs a decision — memory actually gets populated.

Quick Start

git clone https://github.com/sandeep84397/agent-brain.git
cd agent-brain
chmod +x setup.sh
./setup.sh

The setup wizard will:

Create a Python venv and install dependencies
Prompt for your repo paths (or use the template config)
Register the MCP server globally with Claude Code
Offer to customize agent names interactively
Run verification checks

No setup.sh? The server works standalone. Just pip install mcp networkx and register manually:
claude mcp add --transport stdio --scope user agent-brain -- python3 /path/to/server.py
The server gracefully handles a missing config.json — it starts with an empty brain.

Where things land: setup.sh installs a copy of the server to ~/.agent-brain/ with its own venv — that copy is what Claude Code runs. The repo checkout keeps the source. CLI examples in this README use python3 brain/server.py <cmd> from the repo root; against the installed copy, the equivalent is ~/.agent-brain/.venv/bin/python ~/.agent-brain/server.py <cmd>. If you edit the repo copy, re-copy it to ~/.agent-brain/server.py (or re-run setup.sh) and restart Claude Code.

Linking a project (so subagents can use brain)

./setup.sh registers brain at the user level. That's enough for the main Claude Code session, but subagents spawned inside a project read MCP config from project-scoped files. Run:

./setup.sh --link-project=/absolute/path/to/your/project

This is idempotent and writes/merges:

<project>/.mcp.json — adds the agent-brain server entry alongside any existing entries
<project>/.claude/settings.local.json — sets enableAllProjectMcpServers: true and adds agent-brain to enabledMcpjsonServers
<project>/.gitignore — appends .mcp.json, .san/.san_hashes.json, .san/_cache/

After running it, restart Claude Code in the project (/exit then claude), then verify:

~/.agent-brain/.venv/bin/python ~/.agent-brain/server.py diagnose --project=/absolute/path/to/your/project

Subagents not seeing brain tools? See the 4-layer model under Verification.

How To Use It

Once set up, you don't call brain tools yourself — your agents do, automatically, as part of their normal work. Your job is just to give agents tasks and (optionally) review the memory that builds up.

The loop every agent runs

For any non-trivial task, an agent follows this cycle (enforced by the hook — see Enforcement Hook):

1. pre_check(agent, area, action)     ← "has anyone tried this before? did it fail?"
2. log_decision(agent, repo, area,    ← records the plan; unlocks code edits
                action, reasoning)
3. … writes the code …
4. log_outcome(decision_id, outcome,  ← records accepted / rejected / failed + why
               outcome_by, reason)

You just say "add rate limiting to the signup endpoint". The agent does the rest.

Worked example — across two sessions

Monday — a decision gets rejected:

You:   "Add rate limiting to /login"
Agent: pre_check(agent="karan", area="auth", action="rate limit login")
       → "No past failures in 'auth'. Proceed."
Agent: log_decision(... action="in-memory counter per IP", reasoning="simplest")
       → dec_20260609_..._a1b2c3
Agent: …writes code, opens PR…
PE:    log_outcome(dec_..._a1b2c3, outcome="rejected", outcome_by="marcus",
                   reason="in-memory won't survive multi-instance deploy; use Redis")

Friday — a different agent, a related task, a different machine/session:

You:   "Add rate limiting to the signup endpoint"
Agent: pre_check(agent="dev", area="auth", action="rate limit signup")
       → "SIMILAR REJECTIONS (1, 78% match):
          [2026-06-09] karan tried: in-memory counter per IP
          REJECTED by marcus: in-memory won't survive multi-instance deploy; use Redis"
Agent: …goes straight to a Redis-backed limiter, skips the mistake…

No human re-explained the Redis constraint. The brain carried it forward.

How it behaves

When an agent calls…	What happens	What you see
`pre_check`	Searches past decisions in the same area + fuzzy-matches similar actions across all areas	Agent mentions relevant past rejections before coding
`log_decision`	Appends the decision to the journal + drops a marker file	Code edits are now unblocked for ~30 min
Edit/Write without a recent `log_decision`	PreToolUse hook blocks the edit (exit 2)	Agent is forced to log a decision first, then retries
`log_outcome` (rejected)	Records the rejection; raises that agent's rejection rate	Future `pre_check`s surface it; repeat offenders get stricter warnings
Any tool with a `repo`/status	Updates the live office dashboard	Agent appears working/reviewing/blocked at `localhost:3333`

What you get out of it (outcomes)

Outcome	How it helps
Mistakes aren't repeated	A rejection logged once warns every agent, every future session — even on a different machine.
No re-explaining context	Constraints ("use Redis", "don't bypass the auth middleware") live in the brain, not in your head.
Cross-agent learning	What backend-engineer learns, frontend-engineer and QA see. Knowledge is team-wide, not per-agent.
Accountability & trends	Scorecards show each agent's acceptance rate and recurring failure patterns — `agent_scorecard("karan", detail=True)`.
Auditable history	"Why did we build it this way?" → `decisions_for("AuthService.login")` returns every decision that touched it, with reasoning and outcome.
Enforced discipline	The hook means the memory actually gets populated — agents can't silently skip logging and edit code anyway.

Inspecting the memory yourself

You rarely need to, but from the repo root (where you cloned agent-brain):

python3 brain/server.py stats        # overall health: how many decisions/agents/repos
python3 brain/server.py office       # who's working on what right now
python3 brain/server.py savings      # tokens SAN saved (last session / today / all-time)

From any agent/MCP client you can also ask in plain language — "show me the team dashboard", "what decisions touched the payment service?", "what's karan's scorecard?", "how many tokens did SAN save this session?" — and the agent picks the right tool (team_dashboard, decisions_for, agent_scorecard, token_savings).

Architecture

┌─────────────────────────────────────────────────┐
│  Your Machine (global)                          │
│                                                 │
│  ~/.agent-brain/                                │
│  ├── server.py        ← MCP server (17 tools)  │
│  ├── config.json      ← your repos + team      │
│  ├── decisions.json   ← memory snapshot         │
│  └── decisions.journal← append-only deltas      │
│                                                 │
│  ~/.claude/agents/                              │
│  ├── project-manager.md                         │
│  ├── product-owner.md                           │
│  ├── principal-engineer.md                      │
│  ├── backend-engineer.md                        │
│  ├── frontend-engineer.md                       │
│  └── qa-engineer.md                             │
│                                                 │
│  project-repo/.san/   ← SAN-compressed code     │
│  ├── _index.json                                │
│  └── src/**/*.san                               │
│                                                 │
│  dashboard/           ← pixel art office UI     │
│  ├── server.py        (python, zero deps)       │
│  └── static/          (HTML5 Canvas + SSE)      │
└─────────────────────────────────────────────────┘

Performance & internals

The brain is built to stay fast as the decision history grows into thousands of entries:

Concern	How it's handled
Reading the graph	`decisions.json` is parsed once and held in an in-memory cache keyed on file mtime+size — repeat tool calls in a session reuse it (~0.03ms vs ~140ms re-parse). The cache self-invalidates if another session writes the file.
Writing a decision	Writes are O(delta), not O(graph). `decisions.json` is a periodic full snapshot; `decisions.journal` is an append-only log of mutations. Logging an outcome on a ~4MB brain appends ~800 bytes instead of rewriting 4MB. The journal auto-compacts back into the snapshot once it passes 256KB.
SAN freshness	The freshness sweep (stat every indexed file + scan the `.san/` tree) is debounced to once per 60s per repo, so bursts of `get_san`/`query_san` calls don't each pay for it.
Bounded responses	Every list/detail tool caps its output (row limits, per-field truncation) so one giant decision can't blow up a response. Stored text fields are capped at write time too.
Multi-session safety	Each Claude Code session runs its own server process sharing `~/.agent-brain/`. Writes use `os.replace` with pid-unique temp files to avoid cross-process rename collisions.

Files in ~/.agent-brain/: decisions.json (snapshot) + decisions.journal (deltas) are the decision memory — both are needed; don't delete one without the other. office-state.json is live dashboard state (self-pruning), san_savings.jsonl is the token-savings log. All are per-machine and git-ignored.

MCP Tools (17)

Core (every agent uses these)

Tool	Purpose
`pre_check`	Past failures before starting work + plan pointers, escalation hints, model routing
`log_decision`	Record what you decided and why; optional `plan_file` links a written plan
`log_outcome`	Record accepted/rejected/failed after review
`log_feedback`	Reviewers log feedback on decisions

Query

Tool	Purpose
`query_decisions`	Filter decisions by area/agent/repo/outcome
`get_decision`	Full detail + feedback for one decision

Code Bridge

Tool	Purpose
`decisions_for`	Decisions touching a code symbol or file (auto-detected)
`code_impact`	Blast radius: code symbols + callers

Patterns

Tool	Purpose
`get_patterns`	Cluster recurring rejections; pass `action` to find similar past failures

Scorecards

Tool	Purpose
`agent_scorecard`	Stats for one/all agents; `detail=True` for trends + advice
`team_dashboard`	All agents at a glance (`limit` caps rows)

Office Dashboard

Tool	Purpose
`heartbeat`	Report agent status (working/idle/discussing/blocked) for live dashboard
`detect_stalls`	Find agents with open decisions but no activity for N minutes (default 5)

SAN (Structured Associative Notation)

Tool	Purpose
`recompile_san`	Refresh SAN metadata: rebuild index, clean orphans, update hashes. `dry_run=True` for a freshness report only. Does NOT generate content.
`query_san`	Search SAN files by keyword (index + content)
`get_san`	Get SAN-compressed content for a source file (`max_chars` caps output)
`token_savings`	Tokens saved by SAN this session / today / all-time — number + %

Admin (CLI only — not exposed via MCP)

Run from the repo root. These live on the CLI (not MCP) to keep the agent-facing tool surface lean:

python3 brain/server.py validate        # full brain self-tests (81 checks)
python3 brain/server.py validate-san    # SAN subsystem self-tests
python3 brain/server.py san-index <repo> # rebuild _index.json from .san/
python3 brain/server.py stats           # overall brain health
python3 brain/server.py office [repo]    # current office state (debug)
python3 brain/server.py savings         # SAN token savings (last session / today / all-time)

Agent Team

The repo includes 6 agent templates. Each has the Brain Protocol baked in:

Role	File	Responsibility	Pinned model
Project Manager	`project-manager.md`	Coordination, tracking, blockers	Haiku (cheap coordination)
Product Owner	`product-owner.md`	PRDs, acceptance criteria	Sonnet
Principal Engineer	`principal-engineer.md`	Architecture, SOLID, reviews	Opus (review is high-leverage)
Backend Engineer	`backend-engineer.md`	API, services, data layer	Sonnet
Frontend Engineer	`frontend-engineer.md`	UI, app logic, integration	Sonnet
QA Engineer	`qa-engineer.md`	Test plans, validation, quality gates	Sonnet

Model pins live in each template's model: frontmatter — change them to fit your budget. See Model Routing for the full strategy.

Scale by duplicating templates (e.g., backend-engineer-2.md).

Placeholders

Each template has {{ROLE_NAME}} / {{ROLE_NAME_LOWER}} placeholders:

File	Placeholders
`project-manager.md`	`{{PM_NAME}}`, `{{PM_NAME_LOWER}}`
`product-owner.md`	`{{PO_NAME}}`, `{{PO_NAME_LOWER}}`
`principal-engineer.md`	`{{PE_NAME}}`, `{{PE_NAME_LOWER}}`
`backend-engineer.md`	`{{BE_NAME}}`, `{{BE_NAME_LOWER}}`
`frontend-engineer.md`	`{{FE_NAME}}`, `{{FE_NAME_LOWER}}`
`qa-engineer.md`	`{{QA_NAME}}`, `{{QA_NAME_LOWER}}`

setup.sh offers to replace these interactively. Or do it manually:

sed -i 's/{{BE_NAME}}/Arjun/g; s/{{BE_NAME_LOWER}}/arjun/g' ~/.claude/agents/backend-engineer.md

Already have custom agents?

If you already have agent .md files, don't overwrite them. Instead, add the Brain Protocol block to each:

# Brain Protocol
Before starting any task:
1. Call `pre_check(agent="<name>", area="<area>", action_description="<plan>")`
2. If warnings exist, adjust approach
3. Call `log_decision(agent="<name>", repo="<repo>", area="<area>", action="<plan>", reasoning="<why>", files_touched=["<paths>"])`
After feedback:
4. Call `log_outcome(decision_id="<id>", outcome="<result>", outcome_by="<who>", reason="<why>")`
NON-NEGOTIABLE.

Critical: do NOT set the frontmatter tools: field. Claude Code subagents inherit ALL tools from the parent session — including every mcp__agent-brain__* tool — only when tools: is omitted. Setting it (even with ToolSearch included) turns it into a literal allowlist that silently strips MCP tools, because mcp__* is not a valid wildcard. Reference: Claude Code subagents — Available tools.

---
name: my-agent
description: ...
model: claude-sonnet-4-6
# No `tools:` — inherits everything from the parent session, including MCP.
# To restrict tools, use `disallowedTools:` instead.
---

What if I really must restrict tools? Add ToolSearch to your tools: allowlist
and bootstrap brain tools at the top of every task with
ToolSearch(query="agent-brain", max_results=25). This is a fallback for the rare
case where you genuinely need a tool denylist; for normal use, omit tools: entirely.

For reviewers (PE, QA), also add:

5. Call `log_feedback(agent="<name>", decision_id="<their-id>", feedback="<detail>", severity="blocker|warning|info")`

setup.sh shows this snippet if it detects existing agents (choose [m] for manual).

Model Routing (quality per cost)

Spend the expensive model where mistakes are costly to undo; spend the cheap ones where mistakes are cheap to fix. The brain supports this in three layers:

1. Per-role model pins

Each agent template pins a model in frontmatter (model: claude-sonnet-4-6). Defaults follow the phase-cost logic:

Phase	Work	Model	Why
Plan / architecture	System design, module boundaries, implementation plan	Fable / Opus	A wrong architecture costs days of rework; one good plan makes every later step cheaper
Scaffolding / boilerplate	Project setup, DI wiring, data classes, mappers	Sonnet / Haiku	Pattern-matching, not reasoning — executing against the plan, not deciding
Core / complex logic	Encryption flows, state machines, tricky concurrency	Opus, escalate on failure	Start mid-tier; escalate only when the data says so (see below)
Review	Architecture + code review of cheap-model output	Opus / Fable	Read-heavy, write-light — high leverage per output token
Tests / docs / polish	Unit tests against spec, KDoc, README	Sonnet / Haiku	Cheap-model territory

2. `model_routing` config

Declare your routing once in ~/.agent-brain/config.json:

"model_routing": {
  "plan": "fable",
  "implement": "sonnet",
  "review": "opus",
  "boilerplate": "haiku",
  "escalate": "fable"
}

Every pre_check response then ends with one line —
MODEL ROUTING: plan=fable | implement=sonnet | review=opus | boilerplate=haiku | escalate=fable —
so whatever agent is orchestrating spawns subagents on the right tier without you re-explaining the strategy each session. Omit the key and the line disappears.

3. Two-strikes escalation (data-driven)

Repeated failed attempts on a cheap model can cost more than one clean shot on a strong one — but you don't know which problems are "strong-model problems" until the cheap model stumbles. The brain already logs every rejection, so it applies the two-strikes rule automatically:

When the same agent has ≥2 rejected/failed decisions in the same area, pre_check returns:
ESCALATION HINT: 'arjun' has 2 rejected/failed decisions in 'auth'. Two-strikes rule: do NOT retry on the same model tier — re-spawn this task on fable.

The escalation target comes from model_routing.escalate (generic wording if unset). This is per-agent — another agent entering the same area is not escalated by someone else's failures.

4. Plan files as handoff artifacts

Pay for deep thinking once, reuse it across many cheap executions. The planner writes the plan to a file and logs it:

log_decision(agent="marcus", repo="my-app", area="payments",
             action="Designed payment module architecture",
             reasoning="...", plan_file="docs/plans/payments-plan.md")

Every later pre_check in that area surfaces it:

PLAN AVAILABLE: docs/plans/payments-plan.md (by marcus, 2026-06-12).
Read it before re-deriving the approach — execute against it, don't re-plan.

The pointer stays active while the decision is pending or accepted; a rejected plan stops being advertised.

Cost mechanics that matter as much as model choice

Context discipline beats model choice. A Sonnet call with clean context beats an Opus call drowning in irrelevant files. SAN reads (~20% of raw cost) + pre_check (past failures only, not full history) are the brain's context discipline.
The orchestrator burns its own tokens. Spawning a Sonnet subagent from an Opus session still pays Opus rates for coordination. Cheapest pattern: cheap main session as orchestrator, escalate via subagents — not an expensive main session delegating down.
Fewer turns > cheaper tokens. For a genuinely complex task, a strong model finishing in fewer turns can land near mid-tier pricing. Don't be dogmatic — the two-strikes hint exists precisely to catch this case from real outcome data.

Rough split to aim for: ~70% of tokens on Sonnet-tier, ~25% on Opus-tier, ~5% on Fable-tier — that 5% (architecture + final review) determines whether the output is actually good.

Brain Protocol

Every agent must follow this before starting work:

1. pre_check(agent, area, action_description)
   → See past failures. Adjust approach if warnings.

2. log_decision(agent, repo, area, action, reasoning)
   → Record your plan before implementing.

3. [do the work]

4. log_outcome(decision_id, outcome, outcome_by, reason)
   → Record what happened after review.

This is enforced in every agent's .md file as NON-NEGOTIABLE.

Enforcement Hook

Text in .md files is advisory — agents can skip it. The enforcement hook makes it mandatory: any Edit/Write to code files is blocked if no log_decision was called in the last 30 minutes.

How it works:

log_decision() writes a marker file (~/.agent-brain/.last_decision_marker)
A PreToolUse hook fires before every Edit/Write
If marker is missing or stale (>30min), the hook blocks with exit code 2
Claude sees the block reason and calls log_decision before retrying

Skips (no block): .md, .json, .yaml, .toml, config files, .claude/, .git/, .san/, node_modules/, build/

Custom skip patterns — extend the built-in skip list with fnmatch globs in ~/.agent-brain/config.json:

{
  "hook_skip_paths": [
    "**/docs/**",
    "**/.github/**",
    "**/CHANGELOG*",
    "**/migrations/**"
  ]
}

Patterns are matched against the absolute file path. The hook fails open: an invalid hook_skip_paths value is ignored silently rather than blocking your session.

Install (setup.sh does this automatically):

// ~/.claude/settings.json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "python3 /path/to/agent-brain/brain/hooks/enforce_brain_protocol.py",
            "timeout": 5000
          }
        ]
      }
    ]
  }
}

Fail-open: If the marker file is corrupt or the hook script errors, it allows the edit (exit 0). The hook never crashes your workflow — it only blocks when it's confident no decision was logged.

Bypass for direct edits: The hook fires on all Edit/Write — agents and user alike. To skip enforcement when you're editing directly, set BRAIN_SKIP_ENFORCE=1 in your shell before launching Claude Code, or add it to your settings.json env block:
{ "env": { "BRAIN_SKIP_ENFORCE": "1" } }
Agents spawned via the team system won't inherit this, so enforcement stays active for them.

SAN Protocol

Structured Associative Notation compresses code to ~17-27% of its original tokens (81% saved blended, tokenizer-measured) while preserving all facts. See san/README.md for the full spec.

# Before: 80 lines, ~1,200 tokens
class AuthServiceImpl(...) : AuthService { ... }

# After: ~220 tokens
AuthServiceImpl @svc {
  impl: AuthService iface
  deps: UserRepository + TokenProvider + RateLimiter
  fn:login(email, pwd) → AuthResult [validate → verify → issue_jwt]
  fn:register(RegisterRequest) → AuthResult [validate → create → issue_jwt]
  layer: application/service
  patterns: DIP-clean
}

Is SAN worth it? (measured numbers)

Measured with real tokenizers (tiktoken o200k_base and cl100k_base — both agree within 0.1%) across 3 production repos: 954 source/SAN file pairs, Kotlin/Java/TS/JS, ~1.12M source tokens. Compression varies by code style — boilerplate-heavy Android code compresses to ~17%, dense backend logic to ~27%; 18.9% blended (81% saved):

Scenario	Raw source	Via SAN	Saved
Agent reads 1 file (avg)	~1,170 tokens	~220 tokens	~950 (81%)
One task (agent explores ~10 files)	~11,700 tokens	~2,200 tokens	~9.5k per task
Whole codebase in context (954 files)	~1.12M tokens — doesn't fit	~211k tokens — fits in one window	~905k (81%)

Repo (style)	Files	Raw tokens	SAN tokens	Ratio
Android app (Kotlin, boilerplate-heavy)	651	853k	142k	16.6%
Backend (Kotlin, dense logic)	299	247k	67k	27.0%
Web (TS/JS)	4	15k	2.4k	15.7%

Do SAN's unicode operators (→ ⇒ ×) waste tokens? Not on modern tokenizers — measured: → = 1 token on both, and a typical SAN line costs exactly the same in unicode and ASCII form (19 vs 19 tokens). One caveat: standalone ⇒ is 1 token on o200k but 3 on the older cl100k — if you target older models, prefer the ASCII equivalents (->, =>, xN), which the spec allows everywhere.

Savings recur on every read by every agent; generation cost is one-time per file (plus regeneration when the file changes):

Cost side	Amount
Generate 1 file (Sonnet)	~1 read of the source (~1,170 input tokens) + ~220 output tokens
Break-even (token count)	After ~1-2 reads of that file via `get_san` instead of raw
Break-even (dollars)	~2-3 reads if reader = generator price (output tokens cost ~5× input); faster when generation runs on cheap Sonnet and reads are saved on expensive models

Use SAN when:

Agents repeatedly explore the same codebase (every task re-reads files)
The repo is too big to fit in context raw — SAN makes whole-repo reasoning possible
Multiple agents work the same repo (generation cost amortizes across the team)

Skip SAN when:

The repo is small enough to fit in context anyway (< ~50 files)
Files churn rapidly — stale SANs need regeneration, eroding the one-time-cost advantage
One-off scripts / repos agents rarely revisit (won't reach break-even)

Numbers above are tokenizer-measured (tiktoken). The live token_savings tracker below uses a ~4 chars/token estimate, which measured ~1.4 points optimistic vs the real tokenizer (17.5% vs 18.9% ratio) — close enough for tracking, but the table above is the honest benchmark. Measure your own repos:
pip install tiktoken
python3 -c "
import tiktoken; from pathlib import Path
enc = tiktoken.get_encoding('o200k_base')
raw = sum(len(enc.encode(f.read_text(errors='replace'))) for f in Path('.').rglob('*.kt'))
san = sum(len(enc.encode(f.read_text(errors='replace'))) for f in Path('.san').rglob('*.san'))
print(f'raw={raw:,} san={san:,} ratio={san/raw:.1%}')"

Measuring your savings (`token_savings`)

You don't have to estimate — the brain measures it live. Every get_san call records what the raw source read would have cost vs the SAN tokens actually served. Ask any agent:

"how many tokens did SAN save this session?"   → agent calls token_savings()

=== SAN TOKEN SAVINGS ===

This session:
  SAN reads: 14
  Raw source cost avoided: 16,380 tokens
  SAN tokens served: 3,080 tokens
  SAVED: 13,300 tokens (81%)

Today (2026-06-11):  ...
All time:            ...

Or from the shell (reports the last recorded session instead of a live one):

python3 brain/server.py savings

How it counts — deliberately conservative, so the number is trustworthy:

Only get_san reads count (a read that replaced opening the raw file)
query_san searches and decision-memory benefits are not included
Reads where SAN wouldn't have saved anything are skipped
~4 chars/token estimate; events persist in ~/.agent-brain/san_savings.jsonl

Use it to decide whether SAN is paying off: if "All time" savings stay near zero after a week, your agents aren't reading via SAN — check coverage with recompile_san(dry_run=True).

SAN Setup

SAN (Structured Associative Notation) compresses source code to ~17-27% of its original tokens for LLM context. This is optional — the decision memory works without it.

Create .san/ in your repo:
```
mkdir -p your-repo/.san
```
Generate SAN files using the brain-compiler agent (see san/brain-compiler.md):
```
# In Claude Code, spawn the brain-compiler agent:
# "Convert src/services/AuthService.kt to SAN"
```
The compiler writes your-repo/.san/src/services/AuthService.san.

Build the index:

python3 brain/server.py san-index my-backend   # admin CLI; recompile_san also rebuilds it

Query SAN:

query_san("my-backend", "Auth")      # search by keyword
get_san("my-backend", "src/services/AuthService.kt")  # get specific file
recompile_san("my-backend", dry_run=True)    # find stale files

SAN Commands

Command	What it does
`recompile_san("repo", dry_run=True)`	Report which SANs are stale, missing, or orphaned vs source (no changes)
`recompile_san("repo")`	Refresh metadata: rebuild index, clean orphans, update hashes. Does NOT generate SAN content.
`query_san("repo", "keyword")`	Search SAN index + file contents by keyword
`get_san("repo", "src/path/File.kt")`	Get SAN-compressed content for a source file (`max_chars` caps output)
`python3 brain/server.py san-index <repo>`	(CLI) Rebuild `_index.json` from all `.san` files
`python3 brain/server.py validate-san`	(CLI) 24 self-tests: hashing, orphan cleanup, staleness, index building. Isolated temp dir.

How SAN Generation Works

SAN files are only generated by the brain-compiler agent (LLM-powered). The server itself does NOT generate SAN content — it only manages metadata, detects staleness, and cleans up orphans. Asking the agent-brain MCP server to "generate SAN" does nothing; spawn the brain-compiler agent instead.

Workflow:

Brain-compiler generates rich SAN files (dependencies, patterns, execution flow)
Server tracks source hashes to detect when SANs become stale
recompile_san(dry_run=True) / query_san / get_san report stale SANs
You re-run brain-compiler on stale files to regenerate

Which model to use for generation

Use Sonnet. SAN conversion is mechanical (read source → emit facts in SAN notation) — it doesn't need a frontier model, and you'll be converting hundreds of files. The bundled san/brain-compiler.md agent already pins this:

model: claude-sonnet-4-6   # cheap, fast, accurate enough for mechanical conversion

Spend the savings where it matters: your engineering agents consuming SAN can run on bigger models, since SAN cuts their input cost to ~20% of raw anyway. Only escalate the compiler to a bigger model if you find SAN files missing relationships on gnarly, highly-dynamic code.

Generating SAN from other platforms (ChatGPT, Cursor, etc.)

The MCP server is platform-agnostic — any MCP client can call query_san/get_san/recompile_san. Only the brain-compiler agent template is Claude Code specific. SAN files themselves are plain text, so any capable LLM can generate them:

Give the model the SAN spec (san/README.md) + the source file
Save its output to <repo>/.san/<source-path>.san (mirror the source tree, swap extension to .san)
Rebuild the index: python3 brain/server.py san-index <repo> (or call recompile_san("<repo>") from any MCP client)

The server's hash-based staleness tracking works identically regardless of which model wrote the file. Cheap-tier models on other platforms (e.g. GPT-4o-mini class) generally handle the conversion; verify a few files against the spec before bulk-converting.

Content Hashing

SAN staleness detection uses sha256 content hashing to avoid false positives:

Source file hashes are stored in .san/.san_hashes.json
When checking freshness, if the source content hash matches the stored hash, the file is skipped (even if mtime changed)
This catches false positives from git checkout, git stash pop, touch, or editor save-without-change
Hashes are updated when recompile_san runs

Orphan Cleanup

When a source file is deleted, its SAN file becomes an orphan. Orphans are detected and cleaned up automatically:

Every source tracked in .san_hashes.json is checked for existence
If the source is gone, the corresponding .san file and hash entry are removed
.san files with no matching source (even if not in hash tracker) are also cleaned up
Stats report orphans_removed so you can see what was cleaned up

Important: SAN refresh is NOT automatic. Staleness checks run when you call query_san, get_san, or recompile_san(dry_run=True) — they report stale SANs but do NOT regenerate them. To force a full metadata refresh (e.g., after a large merge or branch switch), call recompile_san("repo"). To regenerate stale SAN content, run the brain-compiler agent on the reported files.

Commit .san/ to git. SAN files are prebuilt knowledge — they help any developer (or agent) working on the project. Don't .gitignore them. Add .san/.san_hashes.json to .gitignore — it's a local cache.

Adaptive Warnings

Agents with high rejection rates get progressively stricter warnings:

Rejection Rate	Warning Level	Behavior
< 30%	NORMAL	Standard pre_check
30-49%	ELEVATED	"Pay close attention to past failures"
≥ 50%	STRICT	Shows top rejection patterns, demands extra scrutiny

Agents with fewer than 3 logged decisions always get NORMAL — no judgment on a tiny sample.

Office Dashboard (Live Visualization)

A pixel art virtual office that shows your agents working in real-time. Agents move between desks and the meeting table, show speech bubbles during discussions, and display status indicators.

python dashboard/server.py
# Opens http://localhost:3333 in your browser

Features:

Pixel art office with desks, meeting table, whiteboard, coffee machine
Agents animate: idle bob, working (typing), walking, discussing (gestures)
Status dots: 🟢 working, 🟡 planning, 🟠 reviewing, 🔵 discussing, 🔴 blocked, ⚫ offline
Speech bubbles with actual message content
Chat log sidebar with all agent interactions
Team status panel with live agent list

How it works:

Brain tools (pre_check, log_decision, etc.) auto-update agent status — zero changes to your agents needed
For richer state (idle, messages, discussing), agents can call heartbeat() explicitly
Dashboard reads ~/.agent-brain/office-state.json via SSE (polls every 500ms)
Canvas renders pixel art at 60fps with smooth agent movement

Auto-heartbeat (free, no agent changes):

Brain Tool	Dashboard Status
`pre_check`	Agent shows as "planning"
`log_decision`	Agent shows as "working"
`log_outcome`	Reviewer shows as "reviewing"
`log_feedback`	Reviewer shows as "reviewing", linked to target agent

Explicit heartbeat (richer state):

heartbeat(agent="arjun", status="discussing", talking_to="marcus", message="DIP violation in AuthService?")

→ Both agents walk to meeting table, speech bubbles appear, message shows in chat log.

Tip: Add heartbeat(agent="<name>", status="idle") to agent templates for when they finish a task. Otherwise agents stay at their last status until the 2-minute timeout.

Verification

After setup, run the full validation from the repo root:

python3 brain/server.py validate
# Expected: "Agent Brain Validation: 81 passed, 0 failed ✓ ALL TESTS PASSED"

This tests every subsystem in isolation using a temp directory:

Section	Checks	What's validated
Graph Persistence	4	Save/load, atomic writes, empty state
Decision Memory	16	log_decision, log_outcome, log_feedback, error handling
Pre-check & Warnings	7	Exact matches, similar rejections, adaptive warning levels
Similarity Matching	7	Tokenizer (camelCase split), Jaccard + domain boost, false positives
Pattern Clustering	1	DIP-related rejections cluster together
Scorecards & Dashboard	11	Acceptance rates, trends, team_dashboard rendering
Query & Retrieval	6	Filters, missing ID handling, file-based search
Code Bridge	4	Symbol linking, callers, impact radius
Office State	11	Heartbeat, role resolution, messages, auto-heartbeat
Config & Edge Cases	3	Missing/corrupt config and graph files
SAN System	1	Delegates to the 24-check `validate-san` suite (hashing, orphans, staleness, indexing)
Integration Workflow	10	Full end-to-end: pre_check → decide → reject → feedback → re-check

You can also run just the SAN subsystem: python3 brain/server.py validate-san

Or verify basic connectivity with python3 brain/server.py stats:

Brain Stats:
  Nodes: 0 | Edges: 0
  Decisions: 0 | Feedback: 0 | Code refs: 0
  Areas: none
  Repos: none
  Agents: none

Troubleshooting:

Problem	Fix
brain tools not found	Restart Claude Code. Check `claude mcp list` shows `agent-brain`.
MCP connection error	Check venv: `~/.agent-brain/.venv/bin/python -c "import mcp, networkx"`
No tools registered	Verify: `~/.agent-brain/.venv/bin/python ~/.agent-brain/server.py` shouldn't error
`config.json` not found	Server works without it (empty brain). Create one if you want repo integration.
`AGENT_BRAIN_DIR` not set	Defaults to `~/.agent-brain/`. Set the env var only if you want a custom location.
Anything looks off	Run `~/.agent-brain/.venv/bin/python ~/.agent-brain/server.py diagnose` for a full health report (no Claude session needed).

Diagnose CLI

~/.agent-brain/.venv/bin/python ~/.agent-brain/server.py diagnose [--project=/path/to/project]

Runs a standalone health check from the shell — no Claude session required.

Always verifies:

MCP tools are registered in the server
config.json is valid JSON (or absent — empty brain is OK)
~/.agent-brain/ is writable (decision marker round-trip)
decisions.json is readable if present
agent-brain is registered as an MCP server in ~/.claude.json and/or ~/.claude/settings.json (layer 1)
Every ~/.claude/agents/*.md is subagent-MCP-safe: omits the tools: frontmatter field (preferred — inherits MCP) or lists ToolSearch in tools: (fallback bootstrap)
Per-repo team resolution: which agents the brain considers in-team for each configured repo

With --project=<path>, also verifies:

<project>/.mcp.json exists and registers agent-brain (layer 3)
<project>/.claude/settings.local.json enables project MCP and allowlists agent-brain (layer 4)
<project>/.gitignore covers brain artifacts (informational)

Exit code is 0 when all checks pass, 1 otherwise — safe to call from a CI pre-flight or a dotfiles bootstrap.

How brain MCP reaches Claude Code subagents (4-layer model)

Brain tools work in BOTH the main Claude Code session AND spawned subagents only when all four layers are correctly configured:

Layer	File	What it does	Set by
1	`~/.claude.json` or `~/.claude/settings.json` `mcpServers`	Registers `agent-brain` server for the main session	`setup.sh` (initial install)
2	`~/.claude/settings.local.json` `enabledMcpjsonServers`	User-level allowlist (only relevant if you use allowlist mode)	`setup.sh` (auto-detects allowlist; appends `agent-brain` if needed)
3	`<project>/.mcp.json`	Project-scoped server registration — subagents read this	`setup.sh --link-project=<path>`
4	`<project>/.claude/settings.local.json` `enableAllProjectMcpServers: true` + `enabledMcpjsonServers: ["agent-brain"]`	Project-level activation	`setup.sh --link-project=<path>`

Plus the agent frontmatter rule (see Already have custom agents? below): omit the tools: field so MCP tools are inherited. Setting tools: [Read, Write, ...] makes it an allowlist that silently strips every mcp__* tool, and mcp__agent-brain__* is not a valid wildcard.

After any config change, restart Claude Code (/exit then claude) — MCP and agent definitions are loaded at session start. Then run server.py diagnose --project=<path> to confirm all four layers are wired up.

Requirements

Any MCP-compatible AI code agent (Claude Code, Cursor, Windsurf, Cline, etc.)
Python 3.10+
Optional: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS for multi-agent orchestration
Optional: code-review-graph for code bridge features

Configuration

Edit ~/.agent-brain/config.json:

{
  "repos": {
    "my-backend": "/absolute/path/to/backend",
    "my-frontend": "/absolute/path/to/frontend"
  },
  "team": [
    {"name": "marcus", "role": "principal-engineer"},
    {"name": "arjun", "role": "backend-engineer"}
  ]
}

Per-repo team scoping

A flat team list applies to every repo — fine when one team owns everything. When you run multiple repos with different staffing, scope members per repo so heartbeats from arjun on my-backend don't pollute the my-frontend office state.

Two ways to scope:

Per-entry repos filter (simplest — extends the flat list):

{
  "team": [
    {"name": "marcus", "role": "principal-engineer"},
    {"name": "arjun",  "role": "backend-engineer",  "repos": ["my-backend"]},
    {"name": "priya",  "role": "frontend-engineer", "repos": ["my-frontend"]}
  ]
}

marcus has no repos → global, applies to every repo.
arjun only resolves on my-backend.
priya only resolves on my-frontend.

teams_per_repo override (full replacement for one repo):

{
  "team": [ /* default global team */ ],
  "teams_per_repo": {
    "experimental-repo": [
      {"name": "marcus", "role": "principal-engineer"},
      {"name": "neha",   "role": "product-owner"}
    ]
  }
}

When teams_per_repo[repo] is present, the flat team list is ignored for that repo.

Backwards compatible: configs without teams_per_repo and no repos field on entries behave exactly like before.

How it's used: brain tools that take a repo arg (heartbeat, log_decision, etc.) feed it through _get_team_for_repo() for role resolution and dashboard filtering. The office CLI command (python3 brain/server.py office my-backend) shows only that repo's agents.

Model routing (optional)

"model_routing": {
  "plan": "fable",
  "implement": "sonnet",
  "review": "opus",
  "boilerplate": "haiku",
  "escalate": "fable"
}

Shown as one line in every pre_check; escalate names the tier in the two-strikes escalation hint. See Model Routing.

Customization

Adding more agents

Copy any template, rename, change the {{PLACEHOLDER}} values.

Adding domain terms

Edit _DOMAIN_TERMS in server.py to boost similarity matching for your domain.

Custom warning thresholds

Edit _adaptive_warning_level() in server.py.

License

MIT