MemoryPilot

mcp
Security Audit
Fail
Health Pass
  • License — License: NOASSERTION
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 12 GitHub stars
Code Fail
  • child_process — Shell command execution capability in push_transcript.js
  • fs module — File system access in push_transcript.js
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This MCP server provides AI coding assistants with persistent, searchable memory across sessions. It uses a Temporal Knowledge Graph and hybrid search to organize project knowledge while compressing context to save on API token costs.

Security Assessment
Overall Risk: Medium. The tool claims to be a pure Rust application with zero dependencies and a single binary, but rule-based scans flagged a JavaScript file (`push_transcript.js`) containing file system access and shell command execution capabilities. This discrepancy warrants caution. No hardcoded secrets were detected, and the project does not request inherently dangerous permissions. You should manually verify how this script interacts with the rest of the application and whether it introduces unintended vulnerabilities.

Quality Assessment
The project is very new and actively maintained, with repository activity as recent as today. However, community trust is currently minimal, reflected by only 12 GitHub stars. The licensing is explicitly labeled as "Source Available" in the documentation but registered as NOASSERTION by automated scans, meaning it may not be truly open-source and could contain usage restrictions.

Verdict
Use with caution — verify the purpose and safety of the included JavaScript files before integrating this into your workflow.
SUMMARY

The most advanced AI memory server in the world. Hybrid search, Temporal Knowledge Graph, transformer embeddings, AAAK compression (3x token savings) — pure Rust, single binary, zero dependencies.

README.md


MemoryPilot

The most advanced MCP memory server. Period.

Hybrid search (BM25 + multilingual-e5-small RRF) · 100+ languages · Temporal Knowledge Graph · AAAK compression (3x token savings) · GraphRAG · Chunked RAG · Auto-Linting · Project brain · HTTP API · Single binary · Zero API calls

v4.0 Rust Hybrid RRF multilingual-e5-small 3x token savings Source Available


Why

AI coding assistants forget everything between sessions. MemoryPilot gives them persistent, searchable memory with project awareness, semantic understanding, and automatic knowledge organization. Built-in AAAK compression reduces token consumption by 3x when loading context, saving you money on every API call.

Benchmarks

Search Quality — Real-World (500 memories, 30 scenarios)

Metric MemoryPilot v4.0 MemPalace v3.1 (raw) Quantum Memory Graph
R@5 100% 96.6%¹ 93.4%
R@10 100% N/A 93.4%
NDCG@10 95.6% 88.9%¹ 90.8%
Cluster Coherence 96.7% N/A N/A
Multilingual 100+ languages English only English only
AAAK Compression 3x (no recall loss) 30x (recall drops to 84.2%) N/A
Avg Search Latency ~69 ms N/A ~80 ms
Binary Size 22 MB ~500 MB (Python+ChromaDB) 1.5 GB
Dependencies 0 (single binary) Python + ChromaDB + SQLite Python + ONNX

¹ MemPalace's 96.6% R@5 is measured on LongMemEval-s (~50 sessions per haystack, session-level retrieval). Their AAAK compression mode drops recall to 84.2%. Their benchmark tests raw ChromaDB retrieval — none of the Palace architecture (wings, rooms, closets) is exercised in the benchmark (source). MemoryPilot's scores are measured on a real multi-project memory base (500 memories across 6 projects) with all features active (GraphRAG, KG expansion, combinatorial reranker, importance scoring).


vs the best MCP memory servers:

Feature MemoryPilot v4.0 MemPalace v3.1 Mem0
Search Hybrid BM25 + multilingual-e5-small RRF (384-dim) ChromaDB cosine (all-MiniLM-L6-v2) Vector search (cloud API)
Embeddings multilingual-e5-small (100+ languages, local ONNX) all-MiniLM-L6-v2 (English only) OpenAI API calls (external)
Multilingual 100+ languages native (FR, EN, ES, DE, JA, ZH...) English only Depends on API
Knowledge Graph Temporal triples with validity + confidence Temporal triples (SQLite) Basic graph (no temporal)
GraphRAG Auto entity extraction + graph traversal + combinatorial reranker No No
Chunked RAG Transcript auto-chunking + auto-distillation (8 types) Conversation chunking by exchange No
Compression AAAK compact dialect (~3x token savings) AAAK dialect (experimental, regresses recall to 84.2%) No
Person detection Auto-detects team members from text No No
Self-Healing Background auto-linting loop No No
Garbage collection Heuristic merge + scoring + orphan cleanup No Basic TTL
Project brain Yes, with team members (<1500 tokens) No No
File watcher Context boost from recent edits No No
Deduplication Content hash (exact) + Jaccard 85% (fuzzy) Basic hash Embedding similarity
HTTP API Multi-threaded REST server (optional) No Cloud hosted
Memory types 13 types, importance 1-5 Wings/Rooms hierarchy 1 type
MCP tools 29 tools 19 tools N/A
Privacy 100% local, zero API calls 100% local Cloud dependent
Language Rust (single binary, zero deps) Python (pip install) SaaS
Startup 1-2 ms ~5 ms N/A (cloud)
Binary 22 MB single binary Python + ChromaDB (~500 MB installed) SaaS
Storage SQLite WAL + FTS5 + connection pool ChromaDB Cloud DB
Concurrency Lazy embedding thread + read pool + debounced cleanup Single-threaded Single-threaded

The 8 Pillars

1. Hybrid Search (BM25 + fastembed RRF)

Every memory gets a 384-dimension transformer embedding on insert via fastembed (multilingual-e5-small, local ONNX inference — supports 100+ languages including French, English, Spanish, German, Japanese, Chinese — no API calls, no external services). Search runs both BM25 full-text and cosine similarity in parallel, then merges results with Reciprocal Rank Fusion.

Results are boosted by importance weighting, knowledge graph link density, file watcher context, and penalized for expired knowledge triples.

Performance optimizations:

  • Lazy embedding: add_memory returns instantly, embeddings computed in background thread
  • LRU cache (64 entries): repeated search queries skip embedding computation
  • Read connection pool (4 connections): concurrent vector searches don't block writes
  • Content hashing (FNV-1a): backfill skips unchanged memories

2. Temporal Knowledge Graph

A full knowledge graph with temporal validity. Facts have valid_from / valid_to dates and confidence scores. When facts become outdated, they are invalidated rather than deleted — giving the AI a timeline of how knowledge evolved.

Entities (technologies, files, components, people) are automatically extracted from memory content and linked bidirectionally. Search results from memories with all-expired triples are penalized.

5 dedicated KG tools: kg_add, kg_invalidate, kg_query, kg_timeline, kg_stats

3. GraphRAG

Every memory is automatically analyzed for entities: technologies, file paths, components, projects, and people. Entities are stored in a dedicated table. Memories sharing entities are auto-linked with inferred relationship types (resolves, implements, depends_on, deprecates...).

When searching, MemoryPilot traverses the knowledge graph from the top matches to pull in related context — e.g., finding the architecture decision that led to a specific bug fix. A combinatorial reranker then selects the best cluster of connected memories rather than independent top-K results, producing cohesive context (94% cluster coherence). Tuned RRF fusion (k=40), exact term coverage boost, smart FTS tokenization, query-time KG expansion, temporal recency, and importance tiebreakers push NDCG@10 to 94% with perfect R@5/R@10.

4. Chunked RAG (Transcripts)

Save full conversation transcripts without polluting the LLM context window. The add_transcript tool automatically chunks large texts into ~2000 character blocks and links them together. Chunks are excluded from recall but fully searchable.

Auto-distillation extracts structured memories from transcripts: decision, preference, todo, bug, milestone, problem, and note. Smart disambiguation: a segment mentioning both a bug and its resolution is classified as milestone, not bug.

Supports session_id, thread_id, window_id for multi-window memory scoping.

5. AAAK Compression

Inspired by MemPalace's symbolic memory language. When compact: true is passed to recall or get_project_brain, output is compressed ~3x using a terse, pipe-separated format:

[DEC:5] Use Clerk over Auth0 | tags:auth,stack | proj:MyApp
[PREF:4] Always use TypeScript strict mode | tags:typescript

6. Self-Healing (Auto-Linter)

MemoryPilot watches your files. When you save a Rust, Svelte, or TypeScript file, it lints in the background. Compilation errors are automatically stored as bug memories with the exact stack trace. When the error is fixed, the memory is auto-deleted.

The linter thread reuses a single DB connection for its entire lifetime.

7. Garbage Collection

Old, low-importance memories are scored for cleanup candidacy. Groups of related stale memories are merged into condensed summaries using heuristic keyword extraction. Orphaned links and entities are cleaned. DB is vacuumed after significant deletions.

8. Project Brain

One tool call returns a dense JSON snapshot of a project under 1500 tokens: tech stack, architecture decisions, active bugs, recent changes, key components, and team members (auto-detected person entities). Supports compact: true for AAAK compression.

Install

One-liner (recommended)

git clone https://github.com/Soflution1/MemoryPilot.git && cd MemoryPilot && ./install.sh

The installer builds MemoryPilot, installs the binary to ~/.local/bin/, detects your IDEs, and configures each one automatically.

Supported IDEs:

IDE Config file Auto-configured
Cursor ~/.cursor/mcp.json ✓ (stdio)
VS Code ~/.vscode/mcp.json ✓ (stdio)
Claude Desktop ~/Library/Application Support/Claude/claude_desktop_config.json ✓ (stdio)
Windsurf ~/.codeium/windsurf/mcp_config.json ✓ (stdio)
Claude Code claude mcp add ✓ (CLI)
Codex codex mcp add ✓ (CLI)
ChatGPT Desktop Settings → Apps → Create via HTTP (see below)

The script is idempotent — run it again to update without breaking existing MCP configs.

ChatGPT Desktop

ChatGPT requires a remote MCP endpoint. Start the HTTP server, then add it as a custom connector:

MemoryPilot --http 7437

In ChatGPT: Settings → Apps → Create → URL: http://localhost:7437/mcp

Manual install

git clone https://github.com/Soflution1/MemoryPilot.git
cd MemoryPilot
cargo build --release --features http
cp target/release/MemoryPilot ~/.local/bin/
chmod +x ~/.local/bin/MemoryPilot
xattr -cr ~/.local/bin/MemoryPilot  # macOS only

Then add MemoryPilot to your IDE's MCP config manually (see table above for file paths).

How it works

That's it. MemoryPilot automatically injects a dynamic System Prompt into your IDE on startup. The AI will proactively call add_memory in the background to store your architecture decisions, API keys, and bug fixes without manual intervention. All configured IDEs share the same memory database.

For ChatGPT or any MCP client that needs HTTP: run MemoryPilot --http to expose the Streamable HTTP endpoint at /mcp.

Or use via McpHub for SSE transport with all your other MCP servers.

First run

# If upgrading from v1 (JSON files):
MemoryPilot --migrate

# Compute embeddings for existing memories:
MemoryPilot --backfill

# Force re-embed all (skips unchanged via content hash):
MemoryPilot --backfill-force

MCP Tools (29)

Core

Tool Description
recall Start here. Loads all context in one shot: project memories, scoped thread/window memories, preferences, critical facts, patterns, decisions, global prompt. Supports mode = safe/default/full, compact = true for AAAK compression.
get_project_brain Instant project summary (<1500 tokens): tech stack, architecture, bugs, recent changes, components, team members. Supports compact = true.
search_memory Hybrid BM25 + fastembed RRF search, boosted by importance, graph links, and file watcher context. Batched triple scoring.
get_file_context Memories related to recently modified files in working directory.

Memory CRUD

Tool Description
add_memory Store with lazy embedding, auto-dedup (hash exact + Jaccard 85%), auto entity extraction, auto graph linking. Importance 1-5, TTL.
add_memories Bulk add multiple memories in one call with per-item dedup.
add_transcript Store a long transcript as chunked archive, auto-distill structured memories (decision, preference, todo, bug, milestone, problem, note).
get_memory Retrieve by ID.
update_memory Update content, kind, tags, importance, TTL. Skips re-embedding if content unchanged (hash check).
delete_memory Delete by ID (cascades to entities and links).
list_memories List with project/kind filters and pagination.

Knowledge Graph

Tool Description
kg_add Add a fact triple (subject → predicate → object) with optional validity period and confidence score.
kg_invalidate Mark a triple as expired (sets valid_to), preserving history.
kg_query Query all triples related to an entity, with temporal filtering and direction control.
kg_timeline Chronological history of all triples involving an entity.
kg_stats Summary statistics: total triples, active, expired, unique subjects/objects.

Project & Config

Tool Description
get_project_context Full project context with preferences and patterns.
register_project Register project with filesystem path for auto-detection.
list_projects List projects with memory counts.
get_stats DB statistics: totals, by kind, by project, DB size, hygiene signals.
get_global_prompt Auto-discover GLOBAL_PROMPT.md from ~/.MemoryPilot/ or project root.
export_memories Export as JSON or Markdown with importance stars.
set_config Set config values (e.g. global_prompt_path).

Maintenance

Tool Description
run_gc Garbage collection: merge old memories, clean orphans, vacuum. Supports dry_run.
cleanup_expired Remove expired TTL memories (debounced — runs max once per 60s).
benchmark_recall Recall quality benchmark with golden scenarios.
benchmark_search Search quality benchmark: R@5, R@10, NDCG@10, cluster coherence, latency.
migrate_v1 Import from v1 JSON files.

Memory Types

fact · preference · decision · pattern · snippet · bug · credential · todo · note · milestone · architecture · problem · transcript_chunk

Each memory has importance (1-5), optional TTL, tags, project scope, content hash, and auto-generated embedding + entity links.

CLI

MemoryPilot                          # Start MCP stdio server
MemoryPilot --backfill               # Compute missing embeddings
MemoryPilot --backfill-force         # Re-embed all (skips unchanged via hash)
MemoryPilot --benchmark-recall       # Run recall quality benchmark
MemoryPilot --benchmark-search       # Search quality: R@5, R@10, NDCG@10, cluster coherence
MemoryPilot --http 7437              # Start HTTP REST server (requires --features http)
MemoryPilot --migrate                # Import v1 JSON data
MemoryPilot --version                # Show version
MemoryPilot --help                   # Show help

HTTP API

When built with --features http, MemoryPilot exposes a multi-threaded REST API (4 worker threads, each with its own DB connection):

# Health check
curl http://localhost:7437/health

# Call any MCP tool
curl -X POST http://localhost:7437/tools/call \
  -H 'Content-Type: application/json' \
  -d '{"name": "search_memory", "arguments": {"query": "auth setup", "limit": 5}}'

Architecture

src/main.rs        — CLI + MCP stdio server + file watcher init + HTTP server init
src/db.rs          — SQLite engine: hybrid search, CRUD, KG, GC, brain, recall, lazy embed, connection pool
src/tools.rs       — 29 MCP tool definitions + handlers
src/protocol.rs    — JSON-RPC types
src/embedding.rs   — fastembed (multilingual-e5-small) transformer embeddings, LRU cache
src/graph.rs       — Entity extraction (tech, files, components, people) + relation inference + graph traversal
src/gc.rs          — GC scoring, heuristic memory merging, stopwords
src/watcher.rs     — File system watcher + auto-linter with persistent DB connection
src/http.rs        — Optional multi-threaded HTTP REST server (feature-gated)

Database Schema

memories           — id, content, kind, project, tags, importance, embedding (BLOB),
                     content_hash, expires_at, last_accessed_at, access_count, metadata
memories_fts       — FTS5 virtual table (content, tags, kind, project)
memory_entities    — memory_id, entity_kind, entity_value, valid_from, valid_to
memory_links       — source_id, target_id, relation_type, valid_from, valid_to, confidence
knowledge_triples  — id, subject, predicate, object, valid_from, valid_to, confidence, source_memory_id
projects           — name, path, description
config             — key/value store

Performance

Metric Value
Binary size 22 MB
Startup 1-2 ms
Search (hybrid RRF + reranker) ~10 ms (500 memories)
add_memory latency <1 ms (lazy embed)
Embedding quality Transformer 384-dim (multilingual-e5-small, 100+ languages)
Backfill (1000 memories) ~30s (skips unchanged via hash)
RAM ~15 MB
Read concurrency 4 pooled connections
Runtime dependencies None (ONNX bundled)

Optimizations

  • Lazy embedding: add_memory inserts with NULL embedding, background thread computes and updates asynchronously
  • Content hashing (FNV-1a): --backfill-force skips memories whose content hasn't changed
  • LRU embedding cache (64 entries): repeated search queries reuse cached embeddings
  • Read connection pool (4 connections): concurrent vector searches don't block writes
  • WAL mode: SQLite Write-Ahead Logging for concurrent read/write
  • Batched scoring: knowledge triple counts and link boosts fetched in single queries, not N+1
  • Debounced cleanup: expired memory cleanup runs max once per 60 seconds
  • Prepared statements: graph traversal prepares SQL once, not per node
  • Tuned RRF fusion: k=40 for sharper top-K discrimination vs standard k=60
  • Exact term coverage boost: +10% when 80%+ of query terms appear in memory content
  • Combinatorial reranker: greedy subgraph selection, conservative +5% per connection (cap 15%)
  • KG query expansion: post-retrieval scoring boost from knowledge graph related terms (+4% per entity, cap 15%)
  • Temporal recency: gentle +5% for memories from last 3 days, decaying over 30 days
  • Importance tiebreaker: ±3% per level — never overrides relevance signal

Run Benchmarks Yourself

MemoryPilot --benchmark-search --scenario-limit 30    # R@5, R@10, NDCG@10, cluster coherence, latency
MemoryPilot --benchmark-recall --scenario-limit 12    # top1/top5 hit rate, cross-project leak, credential safety

Storage

  • Database: ~/.MemoryPilot/memory.db
  • Global prompt: ~/.MemoryPilot/GLOBAL_PROMPT.md
  • Fastembed model cache: ~/.fastembed_cache/ (downloaded on first run)

License

Soflution Source Available License — free to use, not to fork or modify. See LICENSE for details.

Built by SOFLUTION LTD

Reviews (0)

No results found