fast-mempalace
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 6 GitHub stars
Code Basarisiz
- rm -rf — Recursive force deletion command in install.sh
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
The fastest benchmarked open-source AI memory system. 100% local for Agents
fast-mempalace
Local-first long-term memory for AI coding agents.
Your agent remembers your codebase and decisions across sessions — in a single static binary. No cloud, no Python, nothing leaves your machine.
Most AI coding sessions start amnesiac: you re-explain the architecture, re-state why
you chose SQLite over Postgres, and the agent still contradicts last week's decision.
fast-mempalace gives the agent a persistent, on-device memory it can search and
write to — wired into Claude Code via MCP tools and session hooks.
- 🧠 Remembers across sessions — semantic recall over everything you've mined or saved.
- 🔒 Fully local & private — embeddings, storage, and search run on-device (
llama.cpp+sqlite-vec). No API keys, no network at query time. - 📦 One static binary — no Python, no Docker, no vector database to run. ~6 MB + a 45 MB embedding model.
- ⚡ Invisible in the loop — session wake-up in ~10 ms; vector search is sub-millisecond once the model is resident.
⚡ Install
curl -fsSL https://raw.githubusercontent.com/debpalash/fast-mempalace/main/install.sh | bash
Detects darwin|linux × x86_64|aarch64. Binary + embedding model land in~/.fast-mempalace/.
🤖 Use with Claude Code (the main event)
Install the plugin — it wires up the MCP memory tools, the session hooks, a skill, and
slash commands:
/plugin marketplace add debpalash/fast-mempalace
/plugin install fast-mempalace
You now have:
| Surface | What it does |
|---|---|
memory_search (MCP) |
Semantic recall before answering about past work |
memory_store (MCP) |
Persist a decision, constraint, or snippet — verbatim |
memory_wake_up (MCP) |
Load the compact continuity brief |
| SessionStart hook | Auto-injects recent memory at the start of every session |
| PreCompact hook | Saves the conversation tail before it's compacted away |
/remember, /recall |
Slash commands for explicit save/recall |
Optionally seed memory from a codebase:
~/.fast-mempalace/bin/fast-mempalace mine . my-project
Works with any MCP client (Cursor, Zed, Windsurf, …) too — point it at
fast-mempalace mcp.
🧩 How it works
Content is organized as Wings (a project/domain) → Rooms (a topic) →
Drawers (a verbatim chunk + its embedding). Retrieval is vector similarity
(sqlite-vec, L2 over L2-normalized MiniLM-L6-v2 embeddings) with light recency and
keyword re-ranking. Everything is one SQLite file.
- Bare-metal embeddings —
llama.cppstatically linked, Metal/CUDA accelerated. - Verbatim storage — your memories are never silently rewritten or summarized by an LLM (and there's no graph-query injection surface to poison).
- Concurrent mining — files embed in parallel via
std.Io.Group.
📊 Benchmarks
Apple Silicon · Metal · cold process unless noted. Methodology →BENCHMARK.md.
| Operation | Time | Notes |
|---|---|---|
Cold start (stats) |
0.01 s | no model load |
| Session wake-up | 0.01 s | recency SQL, no model load — runs every session |
| Mine (15 files → 31 drawers) | ~1.0 s | real on-device MiniLM embeddings |
| Vector search | sub-ms | once the model is resident (e.g. the MCP server); ~0.5 s amortized model load on a one-shot CLI call |
| Peak RAM (search, model loaded) | ~100 MB | mostly the embedding model |
Honest note: earlier
0.59 s / 1171-drawernumbers measured placeholder vectors.
These figures are the real semantic engine. The cost of mining is dominated by
embedding throughput, not I/O.
📦 CLI
fast-mempalace init Initialize the palace database
fast-mempalace mine <path> [wing] Mine a file or directory into the palace
fast-mempalace search <query> Semantic search
fast-mempalace wake-up [--wing X] Print the wake-up context
fast-mempalace stats Palace statistics
fast-mempalace mcp Start the MCP server (stdio JSON-RPC)
fast-mempalace hook Run a Claude Code hook (JSON stdin/stdout)
fast-mempalace kg [subject] Query the knowledge graph
⚙️ Configuration
Reads fast-mempalace.yaml (falls back to mempalace.yaml). Environment variables
override everything — this is how the plugin pins one global palace regardless of the
project directory:
FAST_MEMPALACE_DB=~/.fast-mempalace/palace.db # database path
FAST_MEMPALACE_MODEL=~/.fast-mempalace/lib/minilm.gguf # 384-dim GGUF embedder
FAST_MEMPALACE_WING=my-project # default wing
database_path: "fast-mempalace.db"
model_path: "lib/minilm.gguf"
default_wing: "production"
The embedding model must be 384-dim (MiniLM-L6-v2); the vector table is declaredfloat[384] and the binary validates the model on load.
🔧 Build from source
Needs zig 0.16.0 + cmake.
git clone --recursive https://github.com/debpalash/fast-mempalace
cd fast-mempalace
# 1) Build the statically-linked llama.cpp backend (once)
cmake -S lib/llama.cpp -B lib/llama.cpp/build \
-DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF \
-DGGML_METAL=ON -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_SERVER=OFF -DLLAMA_BUILD_TOOLS=OFF
cmake --build lib/llama.cpp/build -j
# 2) Fetch the 384-dim embedding model
mkdir -p lib && curl -L -o lib/minilm.gguf \
"https://huggingface.co/leliuga/all-MiniLM-L6-v2-GGUF/resolve/main/all-MiniLM-L6-v2.F16.gguf"
# 3) Build
zig build --release=fast
./zig-out/bin/fast-mempalace stats
(On Linux, use -DGGML_METAL=OFF -DGGML_BLAS=OFF.)
🗺️ Roadmap
→ ROADMAP.md.
📄 License
MIT.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi