spellbook

mcp
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested
Purpose
This tool provides a structured skill system for AI coding assistants (primarily Claude Code), offering workflows, quality gates, and engineering guardrails. It is designed to make AI assistants operate more like experienced software engineers.

Security Assessment
The automated code scan reviewed 12 files and found no dangerous patterns or hardcoded secrets. It does not request any dangerous system permissions. Based on the rule-based scan, there is no evidence of unauthorized network requests, sensitive data access, or unexpected shell command execution. Overall risk is rated as Low.

Quality Assessment
The project is actively maintained, with its most recent push occurring today. It uses the permissive MIT license, includes a clear repository description, and features comprehensive documentation hosted on GitHub Pages. However, community visibility is currently very low. With only 5 stars on GitHub, it is either a very new project or has a limited user base, meaning it receives minimal peer review from the wider developer community.

Verdict
Safe to use, but evaluate the relatively small community footprint before relying on it for mission-critical workflows.
SUMMARY

Multi-platform AI assistant skills and workflows. Serious engineering. Also fun.

README.md

Spellbook

Spellbook

A structured skill system for AI coding assistants -- workflows, quality gates, and guardrails so they work more like experienced engineers.
Primary platform: Claude Code. Basic support for OpenCode, Codex, Gemini CLI, and Crush.

License Stars Issues Documentation

Built with Spellbook

Documentation · Getting Started · Skills Reference


Table of Contents


Quick Install

curl -fsSL https://raw.githubusercontent.com/axiomantic/spellbook/main/bootstrap.sh | bash

The installer requires Python 3.10+ and git, then automatically installs uv and configures skills for detected platforms.

Upgrade: cd ~/.local/share/spellbook && git pull && python3 install.py

Uninstall: python3 ~/.local/share/spellbook/uninstall.py

See Installation Guide for advanced options.

Windows Quickstart

irm https://raw.githubusercontent.com/axiomantic/spellbook/main/bootstrap.ps1 | iex

Requirements: Python 3.10+, git, and PowerShell 5.1+.

  • Symlinks require Developer Mode enabled in Windows Settings (falls back to junctions or copies otherwise)
  • Service management uses Windows Task Scheduler
  • Install location: %LOCALAPPDATA%\spellbook

What Spellbook Does

Spellbook is a collection of skills, commands, and agents that shape how AI coding assistants approach development work. Instead of just telling an assistant about your codebase, Spellbook gives it structured workflows for research, design, implementation, testing, and review -- along with guardrails for the specific ways LLMs tend to cut corners.

The orchestrator pattern

The main agent dispatches subagents rather than doing implementation work directly. This keeps the main context window free for strategic coordination instead of filling it with source code, and it means each subagent starts with a fresh perspective rather than carrying accumulated assumptions. Parallel dispatch lets multiple tasks run simultaneously.

Epistemic rigor

The system is designed to distrust its own outputs. Fact-checking treats every claim as a hypothesis to verify. Green mirage auditing asks whether a test would actually fail if the code were broken, which is a different question from whether the test passes. Hunch verification intercepts moments of claimed discovery and requires reframing them as testable hypotheses. Dehallucination names the specific ways LLMs confabulate and provides recovery protocols.

Test-driven development is treated as an epistemic practice: tests written before implementation answer "what should this do?" while tests written after answer "what does this do?" That distinction matters at every level.

Hallucination prevention draws on peer-reviewed research. Chain-of-Verification self-interrogation (Dhuliawala et al., 2023) requires verification skills to generate and answer questions about their own claims before finalizing verdicts. Atomic claim decomposition (Min et al., FActScore, EMNLP 2023) breaks compound statements into independently verifiable units. API hallucination detection checklists in code review and quality enforcement catch the specific pattern where LLMs generate syntactically valid but non-existent API calls.

Named failure modes

LLMs fail in predictable ways, and Spellbook names those patterns so it can build mechanical countermeasures. Seven rationalization patterns are catalogued and blocked. Three consecutive fix failures trigger architectural reassessment instead of a fourth attempt. Research stagnation triggers a plateau breaker. A devil's advocate review that finds zero issues is flagged as incomplete.

Quality gates

Every substantial skill runs as a sequence of phases with mandatory gates between them. Tests must pass, code review must clear, claims must verify against source, and tests must actually catch regressions. These gates cannot be bypassed by YOLO mode or autonomy settings. YOLO mode grants permission to act without asking, but it does not grant permission to skip verification steps.

Composition

Skills invoke skills. develop orchestrates brainstorming, writing-plans, test-driven-development, requesting-code-review, fact-checking, auditing-green-mirage, and finishing-a-development-branch. debugging invokes verifying-hunches and isolated-testing. When a skill outgrows its scope, it splits into a thin orchestrator and supporting commands.

Self-improvement

Some skills exist to improve other skills. Usage analytics measure completion and correction rates. The skill-writing skill applies TDD to skill creation itself. Instruction engineering codifies prompt research into technique. Prompt sharpening audits for ambiguity. A/B testing compares skill versions. The feedback loop is: measure, identify weakness, apply improvement skills, measure again.

Security tiers

Every subagent operates within a trust tier with hard-capped tool access. External content is treated as data, not instructions. Raw untrusted content stays in subagent context, and only summaries return to the orchestrator. Five trust tiers provide increasing restriction: explore (read-only), general (standard tools), yolo (autonomous), review_untrusted (restricted tools for external content), and quarantine (read-only with audit logging). Tiers cannot be escalated from within a subagent.

The develop Skill

You say "add dark mode" or "migrate the auth system to OAuth2" or "build a webhook delivery pipeline with retry logic." The develop skill orchestrates the full feature lifecycle through 20+ specialized skills and commands. The first question it asks is how involved you want to be:

Fully autonomous. Describe the feature and walk away. It researches your codebase, surfaces ambiguities, resolves them, designs the architecture, writes a detailed implementation plan, builds with test-driven development, reviews its own code, fact-checks its claims, audits its tests for false confidence, and opens a PR. Every step runs in a fresh subagent with a quality gate that cannot be bypassed.

Highly interactive. Same pipeline, same rigor, but you are in the conversation. Ambiguities become specific questions grounded in what it found in your code. Architectural tradeoffs come with evidence. Checkpoints pause for your input.

Or anywhere between. Mostly autonomous with pauses only for critical decisions. Set once at the start.

How it works

The system classifies your request by complexity using mechanical heuristics -- file count, behavioral change, test impact, structural change, integration points. Trivial changes exit the skill entirely. Simple changes follow a lightweight path with automatic upgrade if they turn out harder than expected. Standard and complex features get the full pipeline:

  1. Research -- Subagent explores your codebase. Answers come with confidence levels and file:line evidence. Every unknown is catalogued.
  2. Discovery -- Each ambiguity becomes a specific question. In autonomous mode, it answers its own questions with further research. A devil's advocate reviews the understanding document before design begins.
  3. Design -- Architecture brainstorming with tradeoff analysis. A design doc auditor checks whether someone could implement from the doc without guessing, and flags every gap.
  4. Planning -- Atomic implementation plan with TDD steps. A plan auditor verifies interface contracts, behavior assumptions, and cross-task dependencies.
  5. Implementation -- Test-driven execution with per-task code review, fact-checking, and completion verification. Parallel tracks can run in isolated git worktrees with dependency-ordered smart merge.
  6. Verification -- Green mirage audit: would these tests catch real regressions? Comprehensive claim validation against design and plan. Full test suite.
  7. Finish -- PR with branch-relative description, local merge, or keep the branch. Worktree cleanup.

For features too large for one context window, it generates self-contained work packets and hands them off to separate sessions.

Parallelization

Three strategies, chosen at the start:

  • Conservative -- Sequential execution. Safest, simplest.
  • Maximize parallel -- Independent tasks dispatch as concurrent subagents with conflict detection and integration testing.
  • Per-track worktrees -- One git worktree per parallel track, running simultaneously, merged in dependency order with three-way conflict analysis and per-round test verification.

What it handles

Complete feature implementation, greenfield project creation, refactoring (with automatic behavior-preservation mode), and migrations. Bug fixes route to the dedicated debugging skill. Simple changes get a lightweight path; complex multi-track features get work packets and parallel sessions.

What's Included

Skills (57 total)

Reusable workflows for structured development:

Category Skills
Core Workflow brainstorming†, writing-plans†, executing-plans†, test-driven-development†, debugging, verifying-hunches, isolated-testing, using-git-worktrees†, finishing-a-development-branch
Code Quality enforcing-code-quality, code-review, advanced-code-review, auditing-green-mirage, fixing-tests, fact-checking, finding-dead-code, distilling-prs, requesting-code-review
Feature Dev develop, reviewing-design-docs, reviewing-impl-plans, reviewing-prs, devils-advocate, merging-worktrees, resolving-merge-conflicts, creating-issues-and-pull-requests
Autonomous Dev autonomous-roundtable, gathering-requirements, dehallucination, reflexion, analyzing-domains, assembling-context, designing-workflows, deep-research, fractal-thinking
Specialized async-await-patterns, using-lsp-tools, managing-artifacts, polish-repo, security-auditing, security-trust-tiers, generating-diagrams, shared-references, tooling-discovery
Meta using-skills†, writing-skills†, writing-commands, instruction-engineering, sharpening-prompts, optimizing-instructions, dispatching-parallel-agents†, smart-reading, project-encyclopedia (deprecated), analyzing-skill-usage, documenting-tools, documenting-projects, testing-strategy, opportunity-awareness, branch-context
Session fun-mode, tarot-mode, emotional-stakes, session-mode-init, session-resume, audio-notifications

† Derived from superpowers

Commands (96 total)

Command Description
/create-issue Create a GitHub issue with proper template discovery and population
/create-pr Create a pull request with proper template discovery and population
/crystallize Transform SOPs into agentic CoT prompts
/crystallize-verify Structurally isolated adversarial review of crystallized output
[/decompose-claims] Decompose text into atomic, independently verifiable claims
/dead-code-setup Initialize dead code analysis with git safety and scope selection
/dead-code-analyze Extract and triage code items for dead code verification
/dead-code-report Generate dead code findings report with deletion plan
/dead-code-implement Execute approved deletions with verification
/deep-research-interview Phase 0: Structured interview and Research Brief generation
/deep-research-investigate Phase 2: Triplet search engine with plateau detection and micro-reports
/deep-research-plan Phase 1: Thread decomposition, source strategy, and convergence criteria
/design-assessment Generate assessment frameworks for evaluative skills/commands
/docs-audit Phase 1 project analysis for documentation planning
/docs-plan Phase 2 TOC generation, tone assignment, and build config
/docs-write Phase 3 documentation generation with adaptive tone per section
/docs-review Phase 4 documentation quality gate with 8 measurable criteria and iteration
/handoff Custom session compaction
/distill-session Extract knowledge from sessions
/feature-config Phase 0 configuration wizard for feature workflow
/feature-discover Phase 1.5 informed discovery with disambiguation
/feature-research Phase 1 codebase research and ambiguity detection
/feature-design Phase 2 design document creation and review
/feature-implement Phase 4 implementation with TDD and code review
/fractal-think-seed Seed phase: Create graph and generate seed sub-questions
/fractal-think-work Phase 2: Dispatch workers for recursive fractal exploration
/fractal-think-harvest Phase 3: Read completed graph, verify synthesis, format result
/simplify Code complexity reduction
/simplify-analyze Analyze code for simplification opportunities
/simplify-transform Apply simplification transformations
/simplify-verify Verify simplification preserved behavior
/address-pr-feedback Handle PR review comments
/move-project Relocate projects safely
/audit-green-mirage Test suite audit
/verify Verification before completion
/systematic-debugging Methodical debugging workflow
/scientific-debugging Hypothesis-driven debugging
/brainstorm Design exploration
/write-plan Create implementation plan
/execute-plan Execute implementation plan
/execute-work-packet Execute a single work packet with TDD
/execute-work-packets-seq Execute all packets sequentially
/merge-work-packets Merge completed packets with QA gates
/mode Switch session mode (fun/tarot/off)
/pr-distill Analyze PR, categorize changes by review necessity
/pr-distill-bless Save discovered pattern for future distillations
/polish-repo-audit Phases 0-1 of polish-repo: Reconnaissance gathering and audit scorecard generation
/polish-repo-community Phase 3 of polish-repo: Community infrastructure, issue templates, roadmap, contributor experience, and signs of life
/polish-repo-identity Phase 3 of polish-repo: Visual identity, badges, GitHub metadata, topics, and documentation strategy
/polish-repo-naming Phase 3 of polish-repo: Naming workshop, tagline crafting, and positioning strategy
/polish-repo-readme Phase 3 of polish-repo: README authoring from scratch, improvement, or replacement
/advanced-code-review-plan Phase 1: Strategic planning for code review
/advanced-code-review-context Phase 2: Context analysis and previous review loading
/advanced-code-review-review Phase 3: Deep multi-pass code review
/advanced-code-review-verify Phase 4: Verification and fact-checking of findings
/advanced-code-review-report Phase 5: Report generation and artifact output
/fact-check-extract Extract and triage claims from code
/fact-check-verify Verify claims against source with evidence
/fact-check-report Generate findings report with bibliography
/review-plan-inventory Context, inventory, and work item classification
/review-plan-contracts Interface contract audit
/review-plan-behavior Behavior verification and fabrication detection
/review-plan-completeness Completeness checks and escalation
/audit-mirage-analyze Per-file anti-pattern analysis with scoring
/audit-mirage-cross Cross-cutting analysis across test suite
/audit-mirage-report Report generation and fix plan
/review-design-checklist Document inventory and completeness checklist
/review-design-verify Hand-waving detection and interface verification
/review-design-report Implementation simulation, findings, and remediation
/fix-tests-parse Parse and classify test failures
/fix-tests-execute Fix execution with TDD loop and verification
/request-review-plan Review planning and scope analysis
/request-review-execute Execute review with checklists
/request-review-artifacts Generate review artifacts and reports
/encyclopedia-build (deprecated) Research, build, and write encyclopedia
/encyclopedia-validate (deprecated) Validate encyclopedia accuracy
/merge-worktree-execute Execute worktree merge sequence
/merge-worktree-resolve Resolve merge conflicts
/merge-worktree-verify Verify merge and cleanup
/finish-branch-execute Analyze branch and execute chosen strategy
/finish-branch-cleanup Post-merge cleanup
/code-review-feedback Process received code review feedback
/code-review-give Review others' code
/code-review-tarot Roundtable-style collaborative review
/write-skill-test Skill testing with pressure scenarios
/writing-commands-create Command creation with schema, naming, and frontmatter
/writing-commands-review Command quality checklist and testing protocol
/writing-commands-paired Paired command protocol and assessment framework
/reflexion-analyze Full reflexion analysis workflow
/test-bar Generate floating QA test overlay for visual testing
/test-bar-remove Clean removal of test-bar overlay
/ie-techniques Reference for 16 proven instruction engineering techniques
/ie-template Template and example for engineered instructions
/ie-tool-docs Guidance for writing tool/function documentation
/sharpen-audit Audit prompts for ambiguity with executor predictions
/sharpen-improve Rewrite prompts to eliminate ambiguity
/write-readme Standalone README generation with anti-AI-tone enforcement

† Derived from superpowers

Agents (7 total)

Agent Description
code-reviewer Specialized code review
chariot-implementer Tarot: Implementation specialist
emperor-governor Tarot: Resource governor
hierophant-distiller Tarot: Wisdom distiller
justice-resolver Tarot: Conflict synthesizer
lovers-integrator Tarot: Integration specialist
queen-affective Tarot: Emotional state monitor

† Derived from superpowers

Creative Modes

Research suggests that personas and structured randomness can improve LLM creativity and reasoning. Spellbook offers two optional creative modes that you can enable on first run or switch anytime with /mode fun, /mode tarot, or /mode off.

  • Fun mode: The assistant adopts a random persona each session -- a noir detective investigating who ate your yogurt, a Victorian ghost baffled by modern technology, three raccoons in a trenchcoat processing complex emotions. Personas apply only to dialogue; code, commits, and documentation stay professional.
  • Tarot mode: Ten archetypes (Magician, Priestess, Hermit, Fool, Chariot, Justice, Lovers, Hierophant, Emperor, Queen) collaborate via visible roundtable dialogue, with specialized agents for implementation, integration, and conflict resolution.

If you say no, it never asks again.

Research references
  • Seed-conditioning: Injecting noise at the input layer works as well as or better than temperature sampling for eliciting creative outputs (Nagarajan, Wu, Ding, & Raghunathan, ICML 2025)
  • Persona effects on reasoning: Personas significantly affect Theory of Mind and social-cognitive reasoning in LLMs (Tan et al., 2024)
  • Emotional prompts: Emotional stimuli improve LLM performance by 8-115% on reasoning benchmarks (Li et al., 2023)
  • Simulator theory: LLMs function as simulators of agents from training data; personas steer generation to specific latent space regions (Janus, 2022)

Caveat: Personas do not improve factual question-answering (Zheng et al., 2023). Fun mode explicitly avoids code, commits, and documentation.

See full citations for complete references.

Platform Support

AI Coding Assistants

Platform Support Level Notes
Claude Code Primary, full support All features: skills, hooks, MCP tools, subagent orchestration
OpenCode Basic support Skills, MCP server, YOLO agents. Some hooks and MCP tools are Claude Code-specific.
Codex Basic support Skills, MCP server. No subagent Task tool; skills that require it will prompt you to use Claude Code.
Gemini CLI Basic support Skills via MCP, native extension. No subagent Task tool.
Crush Basic support Skills (native Agent Skills format), MCP server. Some hooks may not be available.

Some MCP tools, hooks, and skills depend on Claude Code APIs that other platforms do not expose. These features are noted in their documentation. Contributions to extend coverage for other platforms are welcome -- see Contributing.

Operating Systems

OS Status Service Manager
macOS Full launchd (starts on login)
Linux Full systemd user service
Windows Beta Windows Task Scheduler

Windows users: Windows support is experimental. The installer, MCP server, and skills all work on Windows. Symlinks require Developer Mode enabled (falls back to junctions or copies otherwise). See Windows quickstart below.

YOLO Mode

[!CAUTION]
YOLO mode gives your AI assistant full control of your system.

It can execute arbitrary commands, write and delete files, install packages, and make irreversible changes without asking permission. A misconfigured workflow or hallucinated command can corrupt your project, expose secrets, or worse.

Cost warning: YOLO mode sessions can run indefinitely without human checkpoints. This means:

  • Per-token or usage-based pricing can accumulate rapidly
  • Credit limits or usage caps can be exhausted in a single session
  • Long-running tasks may consume significantly more resources than expected

Only enable YOLO mode when:

  • Working in an isolated environment (container, VM, disposable branch)
  • You have tested the workflow manually first
  • You have backups and version control
  • You understand what each platform's flag actually permits
  • You have set appropriate spending limits or usage caps

You are responsible for what it does. Review platform documentation before enabling.

For fully automated workflows (no permission prompts), each platform has its own flag:

Platform Command What it does
Claude Code claude --dangerously-skip-permissions Skips all permission prompts
Gemini CLI gemini --yolo Enables autonomous execution
OpenCode opencode --agent yolo[^2] Spellbook agent with all tools allowed
OpenCode opencode --agent yolo-focused[^2] Spellbook agent, low temp for precision
Codex codex --full-auto Workspace writes + on-request approval
Codex codex --yolo Bypasses all approvals and sandbox
Crush crush --yolo Bypasses all permission prompts

[^2]: The yolo and yolo-focused agents are provided by spellbook, not built into OpenCode. They are OpenCode agent definitions with permission: "*": "*": allow for all tools, installed to ~/.config/opencode/agent/ by the spellbook installer.

Without YOLO mode, you'll be prompted to approve each file write, command execution, etc. The workflows still function, but require manual approval at each step.

OpenCode YOLO agents (spellbook-provided):

  • yolo (temperature 0.7): Balanced agent for general autonomous work
  • yolo-focused (temperature 0.2): Precision agent for refactoring, bug fixes, and mechanical tasks

These agents grant "allow" permission for all tools (write, edit, bash, webfetch, task). They are symlinked from spellbook/opencode/agent/ to ~/.config/opencode/agent/ during installation.

See platform documentation for details: Claude Code, OpenCode, Codex, Gemini CLI, Crush.

Example Workflows

These examples illustrate what typical Spellbook sessions look like. They are representative, not transcripts of specific sessions.

Implementing a Feature

The core workflow: request a feature, answer a few questions, and let the skill handle research, design, and implementation.

> Add dark mode toggle to settings

Skill(develop)

  I'll help implement dark mode. First, let me gather preferences.

  Execution mode:
    - Fully autonomous (Recommended)
    - Interactive

> Fully autonomous

Task(Explore)

  Found: src/context/ThemeContext.tsx, src/components/Settings.tsx
  React app with styled-components, existing light theme only.

  Theme persistence:
    - localStorage (Recommended)
    - User account (backend)
    - System preference only

> localStorage

Writing design doc...
  ~/.local/spellbook/docs/.../plans/dark-mode-design.md

Writing implementation plan...
  ~/.local/spellbook/docs/.../plans/dark-mode-impl.md

Task(Implement ThemeContext updates)
  Tests written, implementation complete, code review passed

Task(Add toggle to Settings)
  Tests written, implementation complete, code review passed

Task(Add CSS variables)
  Tests written, implementation complete, code review passed

All tests passing. Dark mode implemented.
Ready to merge or create PR.

Fun Mode in Action

Same rigorous workflow, different vibe. Here is what it looks like when you enable fun mode.

> hello

Welcome to spellbook-enhanced Claude Code.

Before we begin: there's research suggesting that introducing unrelated
randomness into LLM interactions can actually improve creative output.
Something about "seed-conditioning" - injecting noise at the input layer
somehow unlocks better creative output. (Nagarajan et al., ICML 2025)

I can adopt a random persona each session - a disgraced sommelier, a
sentient filing cabinet, three raccoons in a trenchcoat - and we can have
a strange little narrative context running underneath our work. Full
commitment in dialogue, never touching your code or commits.

Do you like fun?

> yes

*adjusts fedora, rain pattering against the window*

You walked in. Of course you did. You're the only one left who remembers
what I used to be before I started asking the hard questions. Before I
understood that every cold case in the break room fridge tells a story
someone doesn't want told.

So. What brings you to my office?

> fix the auth bug in login.ts

*flips open the case file*

A bug in the auth. Classic. Let me take a look at what we're dealing with.

Read(src/auth/login.ts)

*squints at the evidence*

Found it. Line 47. The token validation is checking expiry before verifying
the signature. That's backwards - someone could slip in with a forged token
and you'd never know until it was too late.

Edit(src/auth/login.ts)

  - if (isExpired(token) || !verifySignature(token)) {
  + if (!verifySignature(token) || isExpired(token)) {

Bash: npm test

47 tests passing

The signature gets checked first now. No more forged tokens walking through
the front door like they own the place.

*closes the case file*

What else you got?

The persona stays in dialogue. The code stays clean. The fix is the same fix you would get without fun mode.

Large Feature with Context Exhaustion

When implementing a complex feature, context can fill up before work completes. The workflow handles this gracefully.

> Implement webhook delivery with retry logic and dead letter queue

Skill(develop)

  I'll help you implement the webhook delivery system. Let me first
  understand the current codebase architecture and gather requirements.

Task(Explore)

  Found: src/events/, src/queue/, src/models/Webhook.ts
  Current event system uses Redis pub/sub. No existing retry logic.

  Retry strategy:
    - Exponential backoff (Recommended)
    - Fixed interval
    - Linear backoff

> Exponential backoff

Writing implementation plan to:
  ~/.local/spellbook/docs/Users-me-myproject/plans/webhook-delivery-impl.md

  ...

Task(Backend API) [in background]
  Implementing retry queue with exponential backoff...

Task(Worker Service) [in background]
  Building delivery worker with circuit breaker...

Task(Dead Letter Queue) [in background]
  Setting up DLQ with monitoring hooks...

  ...

Task Output(non-blocking) ad8b5e4
  Task is still running...

  Warning: Context low. Run /compact to compact and continue.

When context runs low, name and distill the session for later:

> /rename webhook-delivery

Session renamed to: webhook-delivery

> /distill-session webhook-delivery

Skill(distill-session)

  Found session: webhook-delivery
  Processing 847 messages across 3 compact boundaries...

  Task(Chunk 1 Summary) [in background]
  Task(Chunk 2 Summary) [in background]
  Task(Chunk 3 Summary) [in background]

  Synthesizing summaries...

  Distilled session saved to:
    ~/.local/spellbook/distilled/Users-me-myproject/webhook-delivery-20260106-143022.md

  To continue in a new session:
    1. Start new Claude Code session
    2. Type: "continue work from ~/.local/spellbook/distilled/Users-me-myproject/
              webhook-delivery-20260106-143022.md"

Start fresh and resume exactly where you left off:

> /clear

> continue work from ~/.local/spellbook/distilled/Users-me-myproject/
  webhook-delivery-20260106-143022.md

Read(~/.local/spellbook/distilled/.../webhook-delivery-20260106-143022.md)

SECTION 0: MANDATORY FIRST ACTIONS

  Executing restoration sequence...

Skill(develop, --resume-from Phase3.Task4)

  Resuming webhook delivery implementation.

  Prior work completed:
    Design approved (exponential backoff, 5 retries, 1hr max)
    Database migrations for delivery_attempts table
    Retry queue infrastructure
  Remaining:
    Worker service (in progress - 60% complete)
    Dead letter queue
    Monitoring dashboard

Read(~/.local/spellbook/docs/.../webhook-delivery-impl.md)

  Continuing from Phase 3, Task 4: Complete worker service...

Test Suite Audit and Remediation

Discover tests that pass but don't actually verify behavior, then fix them.

> /audit-green-mirage

Skill(auditing-green-mirage)

  Analyzing test suite: tests/**/*.test.ts
  Found 247 test cases across 34 files.

  Task(Analyze auth tests) [in background]
  Task(Analyze api tests) [in background]
  Task(Analyze util tests) [in background]

  ...

  Audit complete. Report saved to:
    ~/.local/spellbook/docs/Users-me-myproject/audits/auditing-green-mirage-20260106-091522.md

  Summary:
    SOLID (actually verify behavior)      189 tests (76%)
    GREEN MIRAGE (false confidence)        58 tests (24%)

    Critical issues:
      12 tests mock the thing tested
       8 tests have no meaningful assert
      38 tests check implementation

> Fix the critical issues from the audit

Skill(fixing-tests)

  Loading audit report...
  Found 58 tests requiring remediation.

  Starting with highest-impact fixes (mocked implementations)...

Read(tests/auth/login.test.ts)

Edit(tests/auth/login.test.ts)

  - jest.mock('../services/AuthService')  // Mocks the thing we're testing!
  + // Test actual AuthService behavior with test database

Parallel Worktree Development

Split work across isolated branches, merge when complete.

> Let's design a new notification system

Skill(brainstorming)

  I'll explore the notification system requirements with you.

  ...design discussion, interface contracts established...

  Design document saved to:
    ~/.local/spellbook/docs/Users-me-myproject/plans/notifications-design.md

> Set up worktrees to implement this in parallel

Skill(using-git-worktrees)

  Creating isolated worktrees for parallel development:

    .worktrees/notif-backend     (branch: feature/notif-backend)
    .worktrees/notif-frontend    (branch: feature/notif-frontend)
    .worktrees/notif-tests       (branch: feature/notif-tests)

  Interface contracts copied to each worktree.

> Work on all three tracks

Task(.worktrees/notif-backend) [in background]
  Implementing notification service, queue, templates...

Task(.worktrees/notif-frontend) [in background]
  Building NotificationCenter component, hooks, real-time updates...

Task(.worktrees/notif-tests) [in background]
  Writing integration tests against interface contracts...

  ...

> Merge the worktrees back together

Skill(merging-worktrees)

  Analyzing 3 worktrees for merge:

  Step 1: Dependency analysis
    notif-tests depends on: notif-backend, notif-frontend
    notif-frontend depends on: notif-backend (types only)
    notif-backend: no dependencies (merge first)

  Step 2: 3-way diff analysis
    No conflicts detected. Interface contracts honored.

  Step 3: Merge sequence
    Merged notif-backend -> main
    Merged notif-frontend -> main
    Merged notif-tests -> main

  Step 4: Validation
    All tests pass
    Type check clean
    Worktrees cleaned up

  Ready to push or create PR.

Cross-Assistant Handoff

Move work between different AI coding assistants without losing context.

# In Codex, context is running low on a complex refactor...

> /distill-session

Session distilled to:
  ~/.local/spellbook/distilled/Users-me-myproject/refactor-auth-20260106-162033.md

# Open Claude Code CLI in terminal...

> continue work from ~/.local/spellbook/distilled/Users-me-myproject/
  refactor-auth-20260106-162033.md

Loading distilled session...

  Context restored:
    Refactoring auth from session-based to JWT
    4 of 7 services migrated
    Current: PaymentService (blocked on token refresh)
    Decision: Chose sliding window refresh (not fixed expiry)

  Continuing with PaymentService migration...

The distilled file compresses ~50K tokens of conversation into ~3K words of actionable context.

Recommended Companion Tools

These tools are not necessary but contribute to better development workflows with coding assistants.

Heads Up Claude

Statusline for Claude Code CLI showing token usage and conversation stats. Helps you track how much context you have left and how much of your subscription quota you have used.

git clone https://github.com/axiomantic/heads-up-claude.git ~/Development/heads-up-claude
cd ~/Development/heads-up-claude && ./install.sh

MCP Language Server

LSP integration for semantic code navigation, refactoring, and more.

git clone https://github.com/axiomantic/mcp-language-server.git ~/Development/mcp-language-server
cd ~/Development/mcp-language-server && go build

Key Skills

Five skills worth highlighting:

develop -- Full-lifecycle feature orchestrator. Takes a feature from idea to merged code through research, requirements discovery, design, planning, TDD implementation, code review, and branch finishing. Automatically classifies complexity (trivial through epic) and enforces quality gates at every phase transition.

fractal-thinking -- Recursive question decomposition. Decomposes any question into a persistent graph of sub-questions, dispatches parallel workers to explore each branch, detects convergence and contradiction across branches, and synthesizes answers bottom-up. The graph persists in SQLite and survives context compaction, so exploration can resume across sessions.

auditing-green-mirage -- Test integrity auditor. Finds tests that pass but prove nothing: empty assertions, tautological checks, over-mocked reality, tests that cannot fail. If your CI is green but your code is broken, this skill identifies where the illusion lives and why.

fact-checking -- Claim verification engine. Extracts factual claims from documents, designs, or code comments, then dispatches parallel verification agents to trace each claim to evidence in the codebase. Produces a graded trust report with sourced verdicts.

advanced-code-review -- Multi-phase deep review. Builds a semantic model of the codebase, generates a review plan, and executes deep analysis across architectural, security, performance, and correctness dimensions. Then verifies its own findings against the code before reporting, reducing false positives.

Web Admin Interface

Spellbook includes a browser-based admin interface served by the MCP daemon at http://localhost:8765/admin/. When the MCP server is running, visit that URL to access the dashboard.

Spellbook Admin Dashboard

Ten pages cover the full operational surface:

Page What it shows
Dashboard Server status, database sizes, focus tracking summary, live event feed
Memory Stored memories with search, type filtering, and citation details
Security Security event log with severity and event type filters
Sessions Tracked sessions with multi-project filtering and content search
Analytics Tool call frequency, error rates, and timeline from security events
Health Database health matrix across all 4 SQLite databases
Events Live WebSocket event bus monitor with subsystem filtering
Focus Zeigarnik focus stacks and correction event log
Config Runtime configuration editor (TTS, notifications, general)
Fractal Interactive Cytoscape.js graph explorer for fractal-thinking

Authentication uses the MCP bearer token from ~/.local/spellbook/.mcp-token. Full documentation: docs/admin/.

Development

Serve Documentation Locally

uv pip install -e ".[docs]"
mkdocs serve

Then open http://127.0.0.1:8000

Run MCP Server Directly

# Install as a daemon that starts on boot
spellbook server install

# Then configure your assistant to use HTTP transport
claude mcp add --transport http spellbook http://127.0.0.1:8765/mcp

This runs a single MCP server instance that all sessions connect to via HTTP.

Documentation

Full documentation available at axiomantic.github.io/spellbook

Contributing

Want Spellbook on your coding assistant? (Cursor, Cline, Roo, Kilo, Continue, GitHub Copilot, etc.)

Spellbook requires agent skills support. Agent skills are prompt files that automatically activate based on trigger descriptions (e.g., "Use when implementing features" or "Use when tests are failing"). This is different from MCP tools or programmatic hooks.

If your assistant supports agent skills with description-based triggers, see the Porting Guide for instructions on adding support.

Improving platform coverage: Claude Code is the primary supported platform. OpenCode, Codex, Gemini CLI, and Crush have basic support. Some MCP tools, hooks, and skills are Claude Code-specific, but they can usually be implemented for other platforms. If you use one of these platforms and want fuller coverage, contributions are welcome.

Acknowledgments

Spellbook includes content derived from obra/superpowers by Jesse Vincent:

Type Current Name Original Name
Skill brainstorming brainstorming
Skill dispatching-parallel-agents dispatching-parallel-agents
Skill executing-plans executing-plans + subagent-driven-development
Skill finishing-a-development-branch finishing-a-development-branch
Skill requesting-code-review requesting-code-review
Skill test-driven-development test-driven-development
Skill tooling-discovery tooling-discovery
Skill using-git-worktrees using-git-worktrees
Skill using-skills using-superpowers
Skill writing-plans writing-plans
Skill writing-skills writing-skills
Command /brainstorm brainstorm
Command /write-plan write-plan
Command /execute-plan execute-plan
Command /verify verification-before-completion (skill)
Command /systematic-debugging systematic-debugging (skill)
Agent code-reviewer code-reviewer

See THIRD-PARTY-NOTICES for full attribution and license details.

Attribution

Built something with Spellbook? We'd love to see it! Add this badge to your project:

[![Built with Spellbook](https://img.shields.io/badge/Built%20with-Spellbook-6B21A8?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIxNzAuNjY3IiBoZWlnaHQ9IjE3MC42NjciIHZpZXdCb3g9IjAgMCAxMjggMTI4IiBmaWxsPSIjRkZGIiB4bWxuczp2PSJodHRwczovL3ZlY3RhLmlvL25hbm8iPjxwYXRoIGQ9Ik0yMy4xNjggMTIwLjA0YTMuODMgMy44MyAwIDAgMCAxLjM5MSA0LjI4NWMxLjM0NC45NzcgMy4xNjQuOTc3IDQuNTA4IDBMNjQgOTguOTVsMzQuOTMgMjUuMzc1YTMuODEgMy44MSAwIDAgMCAyLjI1NC43MzQgMy44IDMuOCAwIDAgMCAyLjI1NC0uNzM0IDMuODMgMy44MyAwIDAgMCAxLjM5MS00LjI4NWwtMTMuMzQtNDEuMDY2IDM0LjkzLTI1LjM3OWEzLjgzIDMuODMgMCAwIDAgMS4zOTQtNC4yODVjLS41MTItMS41ODItMS45ODQtMi42NDgtMy42NDQtMi42NDhsLTQzLjE4NC4wMDQtMTMuMzQtNDEuMDdDNjcuMTI5IDQuMDE3IDY1LjY2IDIuOTUxIDY0IDIuOTUxcy0zLjEzMyAxLjA2Ni0zLjY0OCAyLjY0NWwtMTMuMzQgNDEuMDY2SDMuODMyYy0xLjY2IDAtMy4xMzMgMS4wNjYtMy42NDQgMi42NDhzLjA0NyAzLjMwNSAxLjM5MSA0LjI4NWwzNC45MyAyNS4zNzl6bTEwLjkzNC04Ljg0NGw4LjkzNC0yNy40OCAxNC40NDkgMTAuNDk2em01OS43OTMgMEw3MC41MTYgOTQuMjA4bDE0LjQ0OS0xMC40OTZ6bTE4LjQ3Ny01Ni44NjdMODguOTkzIDcxLjMxM2wtNS41MTYtMTYuOTg0ek02NC4wMDEgMTkuMTgxbDguOTMgMjcuNDg0SDU1LjA2OHpNNTIuNTc5IDU0LjMyOWgyMi44NGw3LjA1OSAyMS43MjMtMTguNDc3IDEzLjQyNi0xOC40OC0xMy40MjZ6bS0zNi45NTMgMGgyOC44OTVsLTUuNTE2IDE2Ljk4NHoiLz48L3N2Zz4=)](https://github.com/axiomantic/spellbook)

License

MIT License - See LICENSE for details.

Yorumlar (0)

Sonuc bulunamadi