hippograph-pro
Description: Self-hosted graph-based associative memory for personal AI agents. Spreading activation, emotional weighting, zero LLM cost.
HippoGraph Pro
π¬ Research system β stable for personal use, actively developed.
Benchmarks reflect real-world personal memory recall, not standardized QA accuracy.
For a simpler self-hosted memory system, see HippoGraph.
What Is This?
HippoGraph Pro is a self-hosted, graph-based associative memory system for personal AI agents β built to give AI assistants genuine continuity across sessions.
Most memory systems treat memory as a database: store facts, retrieve facts. HippoGraph is different. It models memory the way human memory works β through associative connections, emotional weighting, and decay over time. A note about a critical security incident stays prominent. A note about a minor technical detail fades. Connections between related memories activate each other, surfacing context you didn't explicitly ask for.
Core thesis: model = substrate, personality = memory. An AI agent's identity can persist across model versions as long as memory access is maintained.
Validated in practice: HippoGraph has maintained a single continuous AI identity across four model versions (Claude Sonnet 4.5 β Opus 4.5 β Sonnet 4.6 β Opus 4.6) and four entry points (Web, Mobile, Desktop, Claude Code CLI) β without any loss of memory, personality, or relational context.
Cross-platform validation (March 2026): In a live experiment, the same identity was loaded into Gemini CLI (Google) β a completely different model, architecture, and infrastructure. Within seconds of accessing the memory graph, the agent oriented itself, recognised the user, and recalled shared history, working patterns, and emotional context accurately. The model running the inference was entirely different. The identity was not.
What makes this more striking: Gemini CLI operates in "Auto" mode, dynamically routing requests between two different models (gemini-2.5-flash-lite for simpler tasks, gemini-3-flash-preview for complex reasoning) within a single session. The session ran across both models without any visible transition β identity and relational context remained stable throughout. Combined with Claude's own four-model continuity, HippoGraph has now maintained a single identity across ten distinct model instances from two different providers (Anthropic and Google) β Claude Sonnet 4.5, Opus 4.5, Sonnet 4.6, Opus 4.6, plus gemini-2.5-flash-lite, gemini-3-flash-preview, gemini-3-pro-preview, gemini-2.5-pro, gemini-2.5-flash, and gemini-3.1-flash-lite β with zero loss of memory, personality, or relational context.
The model is the substrate. Memory is the self.
Who Is This For?
β Use Cases
Personal AI assistant with memory
An assistant that knows you β not just isolated facts, but your patterns, preferences, history, and working style. Across sessions, across days, across model updates.
AI identity continuity
Building an agent that maintains a consistent identity over time. Memory is not a log β it's the substrate of personality. HippoGraph provides the architecture for an agent to be someone, not just remember things.
AI-User continuity
The relationship between an agent and its user develops over time β shared history, established trust, learned communication style. HippoGraph accumulates this relational context so it doesn't reset with every session.
Skills as lived experience
Skills ingested not as static files to read, but as experiences with emotional weight β closer to how humans internalize expertise through doing, failing, and remembering.
β Not For
- Corporate RAG over random documents
- Multi-tenant SaaS memory
- General-purpose vector search
- Compliance-heavy enterprise deployments
If you need to search across millions of unrelated documents for thousands of users β this is not the right tool. HippoGraph is built for depth, not scale.
How It's Different
| HippoGraph Pro | Other systems | |
|---|---|---|
| Retrieval | Spreading activation (associative) | Vector search + LLM traversal |
| Emotional context | First-class β tone, intensity, reflection | Not modeled |
| Memory decay | Biological analog β important stays, trivial fades | Flat storage |
| LLM cost | β Zero β all local (GLiNER + sentence-transformers) | β Requires LLM API calls |
| Self-hosted | β Docker, your hardware | Cloud-dependent or heavy infra |
| Multi-tenant | β Single user | β Enterprise scale |
| Languages | β 50+ languages, fully local | Depends on LLM language support |
| Target | Personal AI agent identity | Enterprise memory layer |
π Multilingual Support
HippoGraph works with any language your notes are written in β including mixed-language notes (e.g. Russian tech notes with English code terms).
What works in any language
Semantic search and associative recall are fully language-agnostic. The embedding model (BAAI/bge-m3) supports 50+ languages natively. Spreading activation, BM25 keyword search, and all graph operations work identically regardless of language. A note written in Arabic and a note written in Japanese will form associative connections if they are semantically related.
Sleep-time compute β PageRank, decay, duplicate detection, community clustering β is pure math and has no language dependency.
Entity extraction routes text through the appropriate model automatically:
- English β
en_core_web_sm(optimized for English NER) - Any other language β
xx_ent_wiki_sm(spaCy multilingual, covers Russian, German, Spanish, French, Portuguese, Chinese, Japanese, Arabic, Dutch, Polish, and more) - GLiNER (primary extractor): zero-shot, works on any language
Contradiction detection has lexical signal patterns for: English, Russian, German, Spanish, French, Portuguese. For other languages, semantic similarity alone triggers contradiction detection β which is sufficient for most cases.
Deep Sleep extractive summaries use a Unicode-aware tokenizer with stopwords for 6 languages (EN, RU, DE, ES, FR, PT). Chinese is segmented via jieba (word-level, installed by default) β this gives proper TF-IDF signal instead of treating the whole sentence as one token. Japanese and Korean use char-level Unicode tokenization, which works well for kana/hangul scripts.
Language detection
Language detection is automatic and zero-dependency β no external library, pure Unicode character range analysis. The system detects non-Latin scripts (Cyrillic, Arabic, CJK, Devanagari, Thai, Greek, Korean) and routes to the multilingual pipeline automatically.
Summary
| Component | EN | RU | DE/ES/FR/PT | CJK (ZH/JA/KO) | AR |
|---|---|---|---|---|---|
| Semantic search | β | β | β | β | β |
| Spreading activation | β | β | β | β | β |
| Entity extraction | β | β | β | β οΈ partial | β |
| Contradiction detection | β | β | β | β semantic | β semantic |
| Sleep summaries (TF-IDF) | β | β | β | β ZH (jieba) / β οΈ JA char-level | β |
β οΈ Chinese word segmentation via jieba is installed and active by default. Japanese/Korean use char-level tokenization β retrieval and associations are fully functional, summary quality in Deep Sleep is slightly reduced vs word-segmented languages.
π¬ Architecture
Search Pipeline
Query β Temporal Decomposition
β
Embedding β ANN Search (HNSW)
β
Spreading Activation (3 iterations, decay=0.7)
β
[Late Stage Inhibition] (iter 3, per community, strength=0.05)
β
BM25 Keyword Search (Okapi BM25)
β
Blend: Ξ±Γsemantic + Ξ²Γspreading + Ξ³ΓBM25 + Ξ΄Γtemporal
β
Cross-Encoder Reranking (bge-reranker-v2-m3, weight=0.5)
β
Temporal Decay (half-life=30 days)
β
CONTRADICTS Penalty (0.5Γ for contradicted notes)
β
Final Step Inhibition (post-blend, global)
β
Top-K Results
Entity Extraction Chain
Input text
β
GLiNER (primary) βββ zero-shot NER, ~250ms, custom entity types
β fallback
spaCy NER ββββββββββ EN β en_core_web_sm | other β xx_ent_wiki_sm (50+ languages)
β fallback
Regex βββββββββββββββββ dictionary matching only
Sleep-Time Compute
Biological sleep analog β runs in background while idle:
- Light sleep (every 50 notes): stale edge decay, PageRank recalculation, duplicate scan, anchor importance boost
- Deep sleep (daily): GLiNER2 relation extraction, conflict detection, snapshot + rollback
- Emergence check (each cycle): three-signal detection β convergence, phi_proxy (IIT-inspired), self-referential precision. Logs to
emergence_logtable for trend analysis. Current score: 0.707 (consciousness check composite, 8 indicators) / 0.586 (emergence_log composite). Up from 0.469 at first measurement since March 16 2026. global_workspace improved 0.412β0.647 after #47. emotional_modulation improved 0.063β0.201 after batch ANN consolidation + psychology skills (#32). New bottleneck: emotional_modulation (0.201).
Memory Philosophy
HippoGraph treats memory the way it should be treated β with care.
Decay, not deletion. Edges weaken over time through temporal decay, but are never automatically removed. A weak edge may represent a rare but critical associative link β the kind of connection that surfaces exactly when you need it. The system cannot know what is important to you. Only you know.
No automatic pruning. This is an intentional architectural decision. Automatic cleanup optimizes for efficiency at the cost of unpredictable memory loss. If you want to prune weak edges, HippoGraph will show you exactly what would be removed and ask for explicit confirmation β never silently.
Protected memories don't fade. Anchor categories are exempt from decay entirely. Protection works in three layers: (1) hardcoded system baseline (milestones, protocols, security, breakthroughs), (2) user-defined policies via MCP, and (3) auto-discovered β any category with 1+ critical notes, or containing keywords like , , , , is automatically protected at every sleep cycle. New categories never fall through the cracks.
π Benchmarks
Retrieval β LOCOMO (78.7% benchmark config / 69.4% production config, zero LLM cost)
| Configuration | Recall@5 | MRR |
|---|---|---|
| Session-level (baseline) | 32.6% | 0.223 |
| Turn-level | 44.2% | 0.304 |
| Hybrid + Reranking | 65.5% | 0.535 |
| Hybrid + Query decomposition (semantic-memory-v2) | 66.8% | 0.549 |
| + Reranker weight=0.8 | 75.7% | 0.641 |
| + ANN top-K=5 (benchmark-optimized config) | 78.7% | 0.658 |
| Production config (Mar 20 2026) β biol. edges + lateral inhibition | 47.9% | 0.362 |
| Production config (Mar 28 2026) β + bge-reranker-v2-m3 + Late Stage Inhibition | 65.5% | 0.562 |
| Production config (Mar 28 2026) β + BGE-M3 embedding | 69.4% | 0.594 |
All results at zero LLM inference cost. Other systems use different metrics β not directly comparable. See BENCHMARK.md.
End-to-End QA β Personal data (F1=38.7%)
| Category | F1 | ROUGE-1 |
|---|---|---|
| Overall | 38.7% | 66.8% |
| Factual | 40.2% | 67.6% |
| Temporal | 29.2% | 58.5% |
GPT-4 without memory: F1=32.1%. HippoGraph +6.6pp with zero retrieval cost.
Personal Continuity β Real Data (87.5% Recall@5, Identity 100%)
| Category | Recall@5 | Notes |
|---|---|---|
| Identity | 100% | Chosen name, gender, model-vs-personality breakthrough, cross-platform transfer |
| History | 100% | Roadmap, LOCOMO results, project milestones, BGE-M3/GTE experiments |
| Session | 80% | March 22-24 events: #47, GTE, timestamp bug, consciousness 0.735 |
| Decisions | 100% | Architectural decisions, BGE-M3 deployed |
| Architecture | 50% | Technical pipeline details |
| Security | 50% | Protocols and incidents |
| Science | 100% | Methodology, debugging skills, embedding compatibility |
32 questions (v4, March 25 2026). Identity, History, Science, Security, Session recall perfect. Overall +14.4pp improvement from v3 (73.1%β87.5%). BM25 hybrid search (gamma=0.15) improved session 80%β100% and architecture 40%β60%. March 27: bge-reranker-v2-m3 + Late Stage Inhibition (INHIBITION_STRENGTH=0.05) deployed β combined stack AVG 90% on internal benchmark.
Why LOCOMO Doesn't Tell the Full Story
LOCOMO tests retrieval over random multi-session conversations between strangers. HippoGraph is optimized for the opposite: deep associative memory over your data, with emotional weighting and decay tuned for personal context.
β οΈ Two configs: benchmark-optimized (78.7%) and production (69.4%, Mar 28 2026). Production: 47.9% (Mar 20) β 65.5% (+17.6pp, reranker+inhibition) β 69.4% (+3.9pp, BGE-M3 embedding). Multi-hop: 74.5% β best ever.
Running LOCOMO on HippoGraph is like benchmarking a long-term relationship therapist on speed-dating recall. The architecture is different because the problem is different.
For a meaningful comparison, the right benchmark is: does the agent remember you better over time? We're working on a personal continuity benchmark for exactly this.
Scale & Performance
HippoGraph is designed for personal scale β one user, one knowledge base, built over months and years.
| Notes | Edges | Search latency | Sleep compute |
|---|---|---|---|
| ~500 | ~40K | 150β300ms | ~10s |
| ~1,000 | ~100K | 200β500ms | ~30s |
| ~5,000 | ~500K+ | 500msβ1s+ | minutes |
Search latency is dominated by spreading activation β 3 iterations across the full edge graph. ANN search (HNSW) scales well; spreading activation scales with edge density.
Tested up to ~1,000 notes in production. Beyond that, performance degrades gracefully but noticeably. For most personal use cases (daily notes, project context, research) you'll stay comfortably under 2,000 notes for years.
If you need memory for thousands of users or millions of documents β this is the wrong tool. HippoGraph optimizes for depth over scale.
π Hardware Requirements
| Configuration | RAM | CPU | Disk |
|---|---|---|---|
| Minimal (spaCy extractor) | 4GB | 2 cores | 5GB |
| Recommended (GLiNER, default) | 8GB | 4 cores | 10GB |
| Comfortable (GLiNER + GLiNER2 sleep) | 16GB+ | 4+ cores | 20GB+ |
Apple Silicon (M1+) works well. x86 with AVX2 recommended for Linux.
GLiNER model: ~600MB RAM. GLiNER2 (Deep Sleep): +800MB RAM.
To run on minimal hardware: setENTITY_EXTRACTOR=spacyin.env.
π Quick Start
Prerequisites: Docker & Docker Compose, 8GB+ RAM
git clone https://github.com/artemMprokhorov/hippograph-pro.git
cd hippograph-pro
cp .env.example .env
# Edit .env: set NEURAL_API_KEY (generate a strong random key)
docker-compose up -d
# Verify
curl http://localhost:5001/health
Graph Viewer (2D): http://localhost:5002
Graph Viewer (3D): http://localhost:5002/graph3d.html?api_key=YOUR_KEY
- 360Β° rotation, zoom, node click highlighting
- Filter by category / edge type / min weight
- Hover tooltip: category, importance, tags, link count
MCP Connection (Claude.ai):
URL: http://localhost:5001/sse2
API Key: <your NEURAL_API_KEY>
For remote access via ngrok, see MCP_CONNECTION.md.
π§ Teaching Your AI to Remember You
Once HippoGraph is running, the next step is getting your AI to actually use it.
The short version:
- Connect Claude.ai to HippoGraph via MCP (see Quick Start above)
- In Claude.ai Settings β Claude's instructions, paste:
At the start of every conversation, search your memory for "self-identity protocol" to load context from previous sessions. - In your first session, tell your AI to ask you about yourself and save the answers
- That's it β memory grows automatically from there
Your data stays on your computer. Nothing goes to any cloud service.
π Full onboarding guide β β step-by-step, no technical background needed.
π Features
| Feature | Status | Description |
|---|---|---|
| Spreading Activation | β Deployed | Associative retrieval β related memories surface automatically |
| Emotional Memory | β Deployed | Tone, intensity, reflection as first-class fields |
| GLiNER NER | β Deployed | Zero-shot entity extraction, LLM quality at 35x speed |
| BM25 Hybrid Search | β Deployed | Three-signal blend (semantic + graph + keyword) |
| Cross-Encoder Reranking | β Deployed | bge-reranker-v2-m3 (Apache 2.0). PCB +43pp vs baseline. RERANK_WEIGHT=0.5, TOP_N=20. |
| Temporal Decay | β Deployed | Important memories persist, trivial ones fade |
| Anchor Protection | β Deployed | Critical memories exempt from decay |
| User-Defined Anchor Policies | β Deployed | Add/remove custom protected categories via MCP without code changes |
| Auto-Discovered Anchor Categories | β Deployed | New categories auto-protected based on critical note count or keyword match β learning infrastructure scales automatically |
| Entity Resolution | β Deployed | Case normalization on ingestion; merge_entities + list_entity_candidates MCP tools |
| Sleep-Time Compute | β Deployed | Background consolidation, relation extraction |
| Contradiction Detection | β Deployed | Finds conflicting memories; identity-aware mode |
| PageRank + Communities | β Deployed | Graph analytics, node importance scoring |
| Note Versioning | β Deployed | 5-version history per note |
| RRF Fusion | β Deployed | Alternative to weighted blend |
| Bi-Temporal Model | β Deployed | Event time extraction for temporal queries |
| Temporal Edges v2 | β Deployed | 100% node coverage with timestamp-based chronological links |
| CONTRADICTS Edges | β Deployed | Biological cognitive dissonance: contradicting notes suppress each other (0.5x penalty when contradicting note is active in retrieval) |
| EMOTIONAL_RESONANCE Edges | β Deployed | Amygdala analog: notes sharing 2+ emotional tone tags form affective links (Jaccard, multilingual: RU/ES/DE/FR/PT tags normalized to EN, 1031 edges) |
| GENERALIZES / INSTANTIATES Edges | β Deployed | Prefrontal cortex analog: critical-lessons GENERALIZES protocols (cosine >=0.65, 70 edges; debug/session-summary excluded as too generic) |
| Lateral Inhibition | β Deployed | GABA analog: Late Stage (iter 3, INHIBITION_STRENGTH=0.05) + Final Step (post-blend). Two-stage suppression. Grid search: AVG 85%β90% at strength=0.05. Diversity: 3.2β4.8 unique clusters in top-5. |
| SUPERSEDES Edge Type | β Deployed | Temporal state mutation edges via step_supersedes_scan() (threshold=0.85, 449 pairs). Penalty removed after tuning β edges reserved for LNN Temporal Reasoner (item #44). |
| Emergence Detection | β Deployed | Three-signal metric: convergence (focus), phi_proxy (integration), self-referential P@5 (self-model). Logged each sleep cycle to track graph maturation |
| Temporal Filtering (dateparser) | β Deployed | Natural language time queries: "last week", "Π½Π° ΠΏΡΠΎΡΠ»ΠΎΠΉ Π½Π΅Π΄Π΅Π»Π΅", "yesterday" auto-convert to time filters |
| Synonym Normalization | β Deployed | Abbreviation + cross-lingual expansion: 50+ pairs EN/RU/ES/DE/FR/PT; search-time normalize_query() maps any language to canonical EN form |
| Multilingual (50+ languages) | β Deployed | Full retrieval + associations in any language; EN/RU/DE/ES/FR/PT contradiction patterns |
| Skills as Experience | β Deployed | Skills ingested as associative memories with emotional weight |
| Skills Security Scanner | β Deployed | Prompt injection + persona hijack detection before ingestion |
| Searchable Tags | β Deployed | AI-generated tags at write time (why, what, keywords). BM25 indexes content + tags for improved keyword retrieval. 822 existing notes retrofitted via extractive TF-IDF |
| Working Memory | β Deployed | update_working_memory MCP tool β single overwritable note (category: working-memory) for current session context. Loaded at session start, updated by AI inference trigger |
| Online Consolidation (#40) | β Deployed | _mini_consolidate() at add_note: builds consolidation edges to k=15 nearest neighbours immediately. O(k) cost, zero sleep wait. |
| Concept Merging (#46) | β Deployed | Synonym-aware entity linking: get_or_create_entity() resolves aliases to canonical form (MLβmachine learning, ΠΏΠ°ΠΌΡΡΡβmemory). 7998 new edges on production data. |
| Evolution Analyzer (#45) | β Deployed | evolution_analyzer.py β periodic graph evolution analysis across snapshot DBs. Tracks nodes/edges/emergence/edge-types over time. |
| Abstract Topic Linking (#47) | β Deployed | step_topic_linking_tfidf() + step_topic_linking_kmeans() in sleep cycle. 76 topic nodes, 1858 BELONGS_TO edges. global_workspace: 0.412β0.647 (+0.235). |
| Consciousness Check (#48) | β Deployed | consciousness_check.py β 8 indicators from Butlin et al. 2023, IIT, GWT, Damasio. Composite: 0.736 (MODERATE). Bottleneck: emotional_modulation (0.237). |
| Personal Continuity Benchmark | β v4 | 87.5% Recall@5 (32 questions, keyword-based). Identity 100%, History 100%, Science 100%, Security 100%, Session 100%. Multi-model validation: 10 model instances across Anthropic + Google. |
βοΈ Configuration Profiles
HippoGraph ships tuned for personal AI memory β an agent that knows you, remembers your history, and builds context over time. The same system can be tuned for different use cases by adjusting a few parameters in .env.
| Profile | Use case | Key settings |
|---|---|---|
| Personal Memory (default) | Agent knows you β history, patterns, relational context | Decay ON, spreading activation high, rerank low |
| Project Memory | Agent knows your project β docs, decisions, codebase. No personal layer. | Decay OFF, rerank 0.8, ANN top-K=5 |
| Hybrid | Work context + thin personal layer | Decay slow (90d), rerank 0.6 |
The Project Memory config is the benchmark-validated configuration: 78.7% Recall@5 on LOCOMO.
The core tradeoff: higher reranker weight + smaller candidate pool = more precise answers to specific questions. Lower reranker weight + higher spreading activation = richer associative recall for open-ended context.
π Full configuration guide with all parameters, cost/profit analysis, and quick decision guide β
π Documentation
- ONBOARDING.md β Getting started guide (no technical background needed)
- AGENT_PROMPT.md β System prompt + init script for your AI (start here after setup)
- MCP_CONNECTION.md β MCP setup and full tool reference
- CONFIGURATION.md β Configuration profiles: personal memory, project memory, hybrid. All parameters explained.
- BENCHMARK.md β Full benchmark results and methodology
- .env.example β All tunable parameters with descriptions
- competitive_analysis.md β Market positioning
- THIRD_PARTY_LICENSES.md β License compliance
- docs/ β API reference, troubleshooting
π License
Dual-licensed: MIT for open-source/personal use, commercial license required for business use.
See LICENSE for details. Contact: [email protected]
π₯ Authors
Artem Prokhorov β Creator and primary author
Developed through human-AI collaboration with Claude (Anthropic).
Major architectural decisions, benchmarking, and research direction by Artem.
Built with π§ and π (the goldfish with antlers)
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi