Vault-for-LLM

🧠 A local-first, open-source knowledge management system for LLM agents.
Zero cloud dependency. Zero Docker. Zero PyTorch. Just pip install and go.

What is Vault-for-LLM?

Vault-for-LLM is a four-layer hierarchical knowledge base designed to give any LLM agent persistent, searchable memory. It runs entirely locally using SQLite + sqlite-vec + ONNX embeddings.

Key Features

Four-layer architecture (L0–L3) for structured knowledge injection
Hybrid search: keyword + semantic vector search (ONNX, no GPU needed)
Knowledge graph: auto-inferred entities and edges with 2-hop BFS expansion
Atomic claims with source citations: sub-chunk granularity, every claim traceable to original text
Self-questioning convergence: system judges if it "knows enough" to explain a topic (KAL-inspired)
Cross-family LLM validation: extract with one model, verify with another to catch hallucinations
Freshness tracking + FSRS spaced repetition: automated staleness detection and review scheduling
AAAK compression: 6x compression for compiled knowledge
Trust scoring: every knowledge entry has a confidence score (0.0–1.0)
Lint & contradiction detection: automatic quality checks
MCP server: expose your vault to any MCP-compatible AI agent mid-conversation
CLI-first: 20+ commands for full lifecycle management

Architecture

L0 Identity      → Who the user is (injected every conversation)
L1 Core Facts    → Environment & active projects (injected every conversation)
L2 Context       → Recent decisions & troubleshooting (auto-updated daily)
L3 Deep Knowledge → Architecture, techniques, lessons (searched on demand)

What's New in v0.4.0

Feature	Description
Convergence Check	KAL-inspired self-questioning loop — system asks "Can I explain this?" and keeps learning until it can
Cross Validation	Asymmetric LLM verification — extract claims with Model A, verify with Model B
Freshness Tracking	Automatic staleness detection + FSRS interval scheduling for knowledge review
Atomic Claims	Claims at sub-chunk granularity with `source_span` citations for precision retrieval
Graph Expansion	2-hop recursive CTE walk through knowledge graph for contextual retrieval
MCP Server	Model Context Protocol server — let any chat AI query and inject knowledge mid-conversation
Updated CLI	New commands: `vault converge`, `vault cross-validate`, `vault freshness`

See CHANGELOG.md for full details.

Quick Start

# Install
pip install -e .

# Initialize a project
vault init

# Add knowledge
vault add "My First Entry" --content "Something I learned today"

# Compile (raw → database + compiled)
vault compile

# Search
vault search "my query"

# Health check
vault doctor

See INSTALL.md for detailed installation options.

Directory Structure

your-project/
├── vault.yaml          ← Project config (auto-generated by `vault init`)
├── L0-identity/             ← Who the user is (injected every conversation)
│   └── identity.md
├── L1-core-facts/           ← Core facts (injected every conversation)
│   └── current-projects.md
├── L2-context/              ← Dynamic context (auto-updated daily)
│   └── recent-sessions/
│       └── current.md
├── L3-knowledge/            ← Deep knowledge (searched on demand)
├── raw/                     ← Raw knowledge input (your .md files go here)
├── compiled/                ← AAAK compressed backup (auto-generated)
└── templates/               ← Clean templates for L0/L1/L2

AI Integration Guide

Any LLM Agent (Universal)

Read this README to understand the architecture
Read L0-identity/identity.md to know the user
Read L1-core-facts/current-projects.md for current state
Use vault search "query" for semantic search

Claude Code / Cursor / Any AI IDE

Copy CLAUDE.md (included) into your project root
For deep knowledge, search compiled/ or raw/
Use rg "keyword" raw/ compiled/ for fast lookup

MCP Integration (Chat with your vault)

Connect your vault to any MCP-compatible AI agent:

# Install MCP dependencies
pip install "vault-for-llm[mcp]"

# Start the server
vault-mcp --project-dir /path/to/your/project

Now your AI can search, add, and query knowledge mid-conversation — no manual copy-paste needed.

CLI Reference

Command	Description
`vault init`	Initialize a new project
`vault doctor`	Health check
`vault add "Title" --content "..."`	Add knowledge entry
`vault add "Title" --file notes.md`	Add from file
`vault import doc.md`	Import long document (auto-chunked)
`vault compile`	Compile raw/ → database + compiled/
`vault search "query"`	Search (auto: keyword + semantic)
`vault search "query" --graph-expand 2`	Search + 2-hop graph expansion
`vault list`	List all entries
`vault stats`	Show database statistics
`vault lint`	Run quality checks
`vault converge`	Self-questioning convergence check
`vault cross-validate`	Cross-family LLM validation
`vault freshness`	Freshness + review scheduling
`vault dedup`	Detect semantic duplicates
`vault dedup --dry-run`	Preview merge plan (no changes)
`vault dedup --merge`	Auto-merge duplicates (keeps higher trust)
`vault graph build`	Build knowledge graph
`vault graph show`	Show graph summary
`vault graph export --format mermaid`	Export graph as Mermaid diagram
`vault graph expand <id>`	Expand from a specific node
`vault config set <key> <value>`	Set config (e.g. embedding provider)

MCP Server (Claude Code / Cursor / OpenClaw)

Expose your vault directly to any MCP-compatible AI agent:

# Install MCP dependencies
pip install "vault-for-llm[mcp]"

# Start the server (run from your project directory)
vault-mcp

# Or specify path explicitly
vault-mcp --project-dir /path/to/your/project

Add to your Claude Code config (~/.claude/claude_desktop_config.json):

{
  "mcpServers": {
    "vault": {
      "command": "vault-mcp",
      "args": ["--project-dir", "/path/to/your/project"]
    }
  }
}

Available MCP tools: vault_search, vault_add, vault_get, vault_list, vault_stats

Knowledge File Format

All .md files use YAML frontmatter:

---
title: "Knowledge Title"
category: "concept|technique|workflow|lesson|error|comparison"
layer: "L0|L1|L2|L3"
tags: ["tag1", "tag2"]
trust: 0.0-1.0
source: "source-description"
created: "YYYY-MM-DD"
---

Trust Score Guide

Range	Meaning
0.9+	Verified by real experience
0.7–0.8	High confidence from documentation
0.5–0.6	General knowledge, not yet verified
< 0.3	Unverified, needs review

Compiler

vault compile

What it does:

raw/ → database (upsert by content hash)
raw/ → compiled/ (AAAK 6x compression)
Extract atomic claims with source_span citations
Auto L2 update + lint health check + git commit

Tech Stack

Component	Technology	Why
Database	SQLite + sqlite-vec	Zero-config, portable, vector search
Embeddings	ONNX Runtime (~150MB)	No PyTorch/GPU needed
Search	Hybrid (keyword + vector + graph expansion)	Best of both worlds
Graph	SQLite (entities + edges + 2-hop CTE)	Lightweight relationship tracking
Compression	AAAK format	6x size reduction
Validation	Cross-family LLM + Convergence check	Catch what single models miss

Requirements

Python 3.10+
~150MB for ONNX embedding model (optional)
No GPU, no Docker, no cloud account needed

License

MIT License — see LICENSE.

Built for developers who want their AI agents to actually remember things.