flywheel-memory
Health Warn
- License — License: Apache-2.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Fail
- execSync — Synchronous shell command execution in demos/bootstrap-template/scripts/proof-of-work.test.ts
Permissions Pass
- Permissions — No dangerous permissions requested
This server acts as a knowledge graph bridge for Obsidian vaults, allowing AI agents to search, read, and write local markdown notes using a sophisticated 13-layer scoring system. It processes and structures personal or organizational data entirely on your local machine.
Security Assessment
Overall Risk: Medium. The core design prioritizes privacy, stating it operates "local-first" with zero cloud integration. All writes are git-committed and reversible, and the optional semantic search relies on a small, locally downloaded AI model rather than sending data to external APIs. However, the automated scan flagged a FAIL for using synchronous shell command execution (`execSync`) within a test file (`proof-of-work.test.ts`). While this appears confined to a testing script rather than the core application, executing shell commands is always a potential vector for injection. No dangerous permissions or hardcoded secrets were detected.
Quality Assessment
The project is very new and currently has low community visibility with only 5 GitHub stars, meaning it has not been broadly battle-tested. Despite this, the code quality appears strong. The repository is highly active (last pushed today), uses continuous integration, and is covered by the permissive Apache-2.0 license. The documentation is exceptionally thorough and transparent regarding its internal architecture and benchmarking.
Verdict
Use with caution — the local-first privacy model and excellent documentation are promising, but the low community adoption and the presence of shell execution in the codebase warrant a manual code review before deploying in sensitive environments.
MCP server giving AI a knowledge graph over Obsidian vaults. 13-layer scoring that learns. Local-first, zero cloud.
Flywheel
Flywheel turns an Obsidian vault into a local MCP workspace for AI agents: fast to query and safe to write.
Part of the Flywheel suite — Flywheel Memory is the MCP server. Flywheel Crank is the Obsidian plugin that visualizes it.
What It Does · See It Work · Skills + Flywheel · Get Started · Benchmarks · Testing · Documentation · License
Flywheel is a local MCP server that gives AI agents structured access to an Obsidian vault. Search returns a decision surface with frontmatter, scored backlinks and outlinks, snippets with section context, dates, entity bridges, and confidence. In many cases that is enough context to answer without opening a chain of files. Writes are git-committed, conflict-detected, and reversible. Auto-wikilinks use a deterministic scoring algorithm, and every suggestion has a traceable receipt.
Everything runs on your machine. Nothing leaves your disk. Every action is bounded, inspectable, and reversible. Semantic search is optional — when enabled via init_semantic, a 23 MB embedding model (all-MiniLM-L6-v2) is downloaded once to ~/.cache/huggingface/ and runs locally. No content is sent to any external service.
What It Does
Search your vault
One call returns everything the model needs to answer: frontmatter, scored backlinks and outlinks, snippets with section context, dates, entity bridges, and confidence. Under the hood, entities give the system stable identity, the graph gives it load-bearing structure, and semantic search surfaces related notes when no explicit link exists yet. Keyword search (BM25) finds what you said. Semantic search finds what you meant. Both are fused via Reciprocal Rank Fusion, running locally. How search works ->
Write safely
Every mutation is git-committed, conflict-detected with a SHA-256 content hash, and reversible with one undo. Writes preserve markdown structure, so edits do not corrupt tables, callouts, code blocks, frontmatter, links, comments, or math. Auto-wikilinks use a deterministic 13-layer scoring algorithm where every suggestion has a traceable receipt. For one-off edits, use the direct write tools. For repeatable workflows that search the vault and act on the results, use policies, which are saved YAML workflows that branch on vault state and run multiple write steps as a single atomic operation. How scoring works -> | Policies guide ->
Build context over time
Every accepted link strengthens the graph. Every rejected link updates the scorer. Every write adds more context for the next read. brief assembles a token-budgeted summary of recent activity, and memory persists observations with confidence decay. The graph can be exported as GraphML for visualization in tools like Gephi or NetworkX — see the carter-strategy demo for an example. Configuration ->
See It Work
Voice: The learning loop
From the carter-strategy demo: log a call by voice, watch wikilinks and suggestions appear, accept and reject a few, then log again — the suggestions improve immediately.
https://github.com/user-attachments/assets/cb9e4945-7f0b-410d-85ef-0c42ffc18c6e
https://github.com/user-attachments/assets/bfdae034-6217-426e-bb1d-ff8e2f0d4bc3
https://github.com/user-attachments/assets/4a0635ff-dd73-4fb1-933d-bf384822e2ce
Write: Auto-wikilinks on mutation
> Log that Stacy reviewed the security checklist before the Beta Corp kickoff
flywheel -> edit_section action=add
path: "daily-notes/2026-01-04.md"
section: "Log"
suggestOutgoingLinks: true
content: "[[Stacy Thompson|Stacy]] reviewed the [[API Security Checklist|security checklist]]
before the [[Beta Corp Dashboard|Beta Corp]] kickoff
-> [[GlobalBank API Audit]], [[Acme Data Migration]]"
You type a normal sentence. Flywheel resolves known entities, detects prospective entities (proper nouns, acronyms, CamelCase terms), and adds wikilinks and suggests related links based on aliases, co-occurrence, graph structure, and semantic context. Suggested outgoing links are optional and off by default. Enable them where you want the graph to grow naturally, such as daily notes, meeting logs, or voice capture. Configuration guide ->
Boundaries
- Writes happen through visible tool calls.
- Changes stay within the vault unless you explicitly point a tool somewhere else.
- Git commits are opt-in.
- Proactive linking can be disabled.
Policy example: Search the vault, then act on itReproduce it yourself: The carter-strategy demo includes a
run-demo-test.shscript that runs the full sequence end to end withclaude -p, checking tool usage and vault state between steps.
> Create a policy that finds overdue invoices and logs follow-up tasks in today's daily note
flywheel -> policy action=author
description: "Find invoices with status:sent, create follow-up task list in daily note"
✓ Saved to .flywheel/policies/overdue-invoice-chaser.yaml
> Preview the overdue-invoice-chaser policy
flywheel -> policy action=preview name=overdue-invoice-chaser
Step 1: vault_search: query "type:invoice status:sent" in invoices/ -> 3 results
Step 2: edit_section: would append to daily-notes/2026-03-31.md#Tasks
(no changes made; preview only)
> Execute it
flywheel -> policy action=execute name=overdue-invoice-chaser
✓ 2 steps executed, 1 note modified, committed as single git commit
Policies search the vault, then write back. Author them in plain language, preview before running, and undo with one call if needed. Policies guide -> | Examples ->
Skills + Flywheel
Skills encode methodology: how to do something. Flywheel encodes knowledge: what you know. They are complementary layers:
| Layer | What it provides | Example |
|---|---|---|
| Skills | Procedures, templates, reasoning frameworks | "How to write a client proposal" |
| Flywheel | Entities, relationships, history, context | "Everything you know about this client" |
An agent calling a proposal-writing skill works better when it can also search your vault for the client's history, past invoices, project notes, and team relationships. Skills tell agents how to work. Flywheel tells them what you know.
OpenClaw skills and Flywheel connect through MCP. OpenClaw routes intent and manages session flow; Flywheel provides the structured context and safe writes that make responses accurate. Integration guide ->
Get Started
Quick start
git clone https://github.com/velvetmonkey/flywheel-memory.git
cd flywheel-memory/demos/carter-strategy && claude
Then ask: "How much have I billed Acme Corp?"
| Demo | You are | Ask this |
|---|---|---|
| carter-strategy | Solo consultant | "How much have I billed Acme Corp?" |
| artemis-rocket | Rocket engineer | "What's blocking propulsion?" |
| nexus-lab | PhD researcher | "How does AlphaFold connect to my experiment?" |
| zettelkasten | Zettelkasten student | "How does spaced repetition connect to active recall?" |
Your Vault in 2 Minutes
Add .mcp.json to your vault root:
{
"mcpServers": {
"flywheel": {
"command": "npx",
"args": ["-y", "@velvetmonkey/flywheel-memory"]
}
}
}
cd /path/to/your/vault && claude
Flywheel watches the vault, maintains local indexes, and serves the graph to MCP clients. Your source of truth stays in markdown. If you delete .flywheel/state.db, Flywheel rebuilds from the vault.
Optional: Tool presets
The default agent preset is the smallest useful surface: search, read, write, tasks, memory, and diagnostics. Use power when you want everyday maintenance tools like wikilinks, corrections, schema work, and note operations. Use full when you want the entire surface visible from the start. auto is kept for backward compatibility and behaves like full plus an informational discover_tools helper.
| Preset | Tools | Categories | Behaviour |
|---|---|---|---|
agent (default) |
14 | search, read, write, tasks, memory, diagnostics | Focused tier-1 surface — search, read, write, tasks, memory |
power |
18 | search, read, write, tasks, memory, diagnostics, wikilinks, corrections, note-ops, schema | Tier 1+2 — agent + wikilinks, corrections, note-ops, schema |
full |
20 | search, read, write, tasks, memory, diagnostics, wikilinks, corrections, note-ops, schema, graph, temporal | All categories visible at startup |
auto |
21 | search, read, write, graph, schema, wikilinks, corrections, tasks, memory, note-ops, temporal, diagnostics | Full surface + informational discover_tools helper |
Claude Code note: the
memorymerged tool is suppressed under Claude Code
(CLAUDECODE=1) because Claude Code ships its own memory plane. Agent preset
exposes 13 tools under Claude Code instead of 14;briefstays available.
Compose bundles for custom configurations:
{
"mcpServers": {
"flywheel": {
"command": "npx",
"args": ["-y", "@velvetmonkey/flywheel-memory"],
"env": {
"FLYWHEEL_TOOLS": "agent,graph"
}
}
}
}
Browse all tools -> | Preset chooser + config ->
Multiple vaults
Serve more than one vault from a single Flywheel instance with FLYWHEEL_VAULTS:
{
"mcpServers": {
"flywheel": {
"command": "npx",
"args": ["-y", "@velvetmonkey/flywheel-memory"],
"env": {
"FLYWHEEL_VAULTS": "personal:/home/you/obsidian/Personal,work:/home/you/obsidian/Work"
}
}
}
}
Search automatically spans all vaults and tags each result with its source vault. Each vault keeps separate indexes, graph state, file watchers, and config.
Full multi-vault configuration -> | Client setup examples ->
Windows usersThree things differ from macOS and Linux:
- Use
cmd /c npxinstead ofnpx. On Windows,npxis installed as a.cmdscript and cannot be spawned directly. - Set
VAULT_PATHto your vault's Windows path. - Set
FLYWHEEL_WATCH_POLL: "true". Without polling, Flywheel will not reliably pick up changes made from Obsidian on Windows.
See docs/CONFIGURATION.md#windows for the full example.
If you use Cursor, Windsurf, VS Code, OpenClaw, or another client, see docs/SETUP.md for client-specific configuration. For OpenClaw, use the dedicated OpenClaw integration guide.
Benchmarks
Agent-first tools should prove their claims. Flywheel ships with reproducible benchmarks against academic retrieval standards:
- HotpotQA full end to end: 90.0% document recall on 50 questions / 4,960 docs. Latest artifact: April 10, 2026. Cost in that run: $0.083/question.
- LoCoMo full end to end: 81.9% evidence recall and 54.0% answer accuracy on 695 scored questions / 272 sessions. Latest artifact: April 10, 2026. Final token F1: 0.431.
- LoCoMo unit retrieval: 84.8% Recall@5 and 90.4% Recall@10 on the full non-adversarial retrieval set.
Every number below ties back to a checked-in report or reproducible harness in the repo.
Multi-hop retrieval vs. academic baselines (HotpotQA, 500 questions, 4,960 documents):
| System | Recall | Training data |
|---|---|---|
| BM25 baseline | ~75% | None |
| TF-IDF + Entity | ~80% | None |
| Baleen (Stanford) | ~85% | HotpotQA |
| MDR (Facebook) | ~88% | HotpotQA |
| Flywheel | 90.0% | None |
| Beam Retrieval | ~93% | End-to-end |
Conversational memory retrieval (LoCoMo, 1,531 scored retrieval queries, 272 session notes):
| Category | Recall@5 | Recall@10 |
|---|---|---|
| Overall | 84.8% | 90.4% |
| Single-hop | 88.1% | 91.7% |
| Commonsense | 95.4% | 98.3% |
| Multi-hop | 58.1% | 72.7% |
| Temporal | 56.9% | 67.4% |
E2E with Claude Sonnet (latest checked-in 695-question run): 97.4% single-hop evidence recall, 73.7% multi-hop evidence recall, 81.9% overall evidence recall, and 54.0% answer accuracy (Claude Haiku judge). Full methodology and caveats ->
Directional, not apples-to-apples. Test settings, sample sizes, retrieval pools, and metrics differ. Flywheel searches 4,960 pooled docs, which is harder than the standard HotpotQA distractor setting of 10 docs and much smaller than fullwiki. Academic retrievers are trained on the benchmark; Flywheel uses no benchmark training data. Expect about 1 percentage point of run-to-run variance from LLM non-determinism. Full caveats ->
demos/hotpotqa/ · demos/locomo/ · Full methodology ->
Testing
3,292 defined tests across 185 test files and about 64.4k lines of test code. CI runs focused jobs on Ubuntu, plus a full matrix on Ubuntu and Windows across Node 22 and 24.
- Graph quality: Latest generated report shows balanced-mode 50.6% precision / 66.7% recall / 57.6% F1 on the primary synthetic vault, along with multi-generation, archetype, chaos, and regression coverage. Report ->
- Live AI testing: Real
claude -psessions verify tool adoption end to end, not just handler logic. - Write safety: Git-backed conflict detection, atomic rollback, and 100 parallel writes with zero corruption in the checked-in test suite.
- Security: Coverage includes SQL injection, path traversal, Unicode normalization, and permission bypass cases.
Full methodology and results ->
Documentation
| Doc | Why read it |
|---|---|
| PROVE-IT.md | Start here to see the project working quickly |
| TOOLS.md | Full tool reference |
| COOKBOOK.md | Example prompts by use case |
| SETUP.md | Full setup guide for your vault |
| CONFIGURATION.md | Environment variables, presets, and custom tool sets |
| ALGORITHM.md | Link scoring and search ranking details |
| ARCHITECTURE.md | Indexing, graph, and auto-wikilink design |
| TESTING.md | Benchmarks, methodology, and test coverage |
| TROUBLESHOOTING.md | Diagnostics and recovery |
| SHARING.md | Privacy notes, tracked data, and shareable stats |
| VISION.md | Project direction and longer-term goals |
License
Apache-2.0. See LICENSE for details.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found