hermes-memory-installer
Health Warn
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 100 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Agent-agnostic memory sidecar for AI coding agents. One install, multi-agent shared recall, production-grade. AI智能体持久记忆系统,多Agent共享记忆层,生产级部署。
Memory Sidecar Installer v3.0
A production-grade sidecar memory system for AI agents.
The Problem
Every AI coding agent session starts blank. Claude Code, Cursor, Codex, Hermes — none have persistent long-term memory out of the box. You close a session and everything it learned about your project, your preferences, your ongoing work — gone.
Running multiple agents on the same project? Each one starts from zero, with no shared context, no institutional memory. The agent frameworks don't fix this because it's not their job. But if you're running agents in production, you hit this wall every day.
What v3.0 Is
Memory Sidecar v3.0 is a sidecar memory system that sits alongside your agent. It does not patch the agent's core. Instead, it captures what the agent learned, indexes it, and makes it available to the next session — and to every other agent on the same server.
- durable session intake and long-term archival
- canonical memory objects with governance indexes
- focused dossiers for important people, projects, and topics
- layered retrieval with intent-aware routing and fusion
- health checks, acceptance checks, and backlog remediation
- optional semantic search via vector embeddings
Multi-agent support: all scripts use the AGENT_HOME environment variable (backward compatible with HERMES_HOME).
Mount the sidecar to any agent by setting AGENT_HOME to the agent's data directory.
Use Cases
| Scenario | What the sidecar does |
|---|---|
| Cross-session continuity | Agent remembers project decisions, user preferences, ongoing tasks across restarts |
| Multi-agent team | Hermes + Claude Code + Codex share the same memory layer — no silos |
| Production deployment | Health checks, acceptance test suite, backlog remediation for self-healing |
| Bilingual teams | First-class Chinese + English support from day one, 6 multilingual embedding models |
| Knowledge management | Session archives → governance objects → focused dossiers → tiered retrieval |
Architecture
Agent Core
└─ writes state.db + session JSON
Sidecar Capture Layer
└─ session_to_gbrain.py — incremental session ingestion → gbrain
Sidecar Governance Layer
├─ memory_family_registry.py — query intent classification + focus profiles
├─ memory_governance_rebuild.py — canonical objects, hubs, multi-version status, vector index
└─ memory_guardian.py — capacity monitoring, consolidation drain, stuck-op recovery
Sidecar Recall Layer
└─ tiered_context_injector.py — layered retrieval (L1/L2/L3), RRF fusion, rerank
Sidecar Maintenance + Acceptance
├─ memory_maintenance_cycle.py — orchestrator: archive → rebuild → drain → recall → health
└─ sidecar_acceptance_check.py — production verification suite
See ARCHITECTURE.md for the full technical breakdown.
Quick Start
Prerequisites
- Python 3.9+
- gbrain installed and serving
- Hindsight running (port 8890 by default)
- An agent (Hermes / Claude Code / etc.) already producing sessions
Install
git clone https://github.com/mage0535/hermes-memory-installer.git
cd hermes-memory-installer
python3 installer/install.py
Non-interactive install with explicit embedding model:
python3 installer/install.py --noninteractive --embedding intfloat/multilingual-e5-small
The installer deploys the supported sidecar scripts into $AGENT_HOME/scripts/, patches $AGENT_HOME/config.yaml, and writes install metadata to $AGENT_HOME/memory-sidecar/install-profile.json.
Mount to a Different Agent
export AGENT_HOME=/home/user/.my-agent
python3 installer/install.py --noninteractive
Backward compatible: --hermes-home and HERMES_HOME env var also work.
Run One Maintenance Cycle
AGENT_HOME=/root/.hermes python3 $AGENT_HOME/scripts/memory_maintenance_cycle.py
Run Acceptance Checks
AGENT_HOME=/root/.hermes python3 $AGENT_HOME/scripts/sidecar_acceptance_check.py
What Gets Installed
The supported v3.0 sidecar runtime consists of these 7 scripts:
memory_family_registry.pymemory_governance_rebuild.pymemory_guardian.pymemory_maintenance_cycle.pysession_to_gbrain.pysidecar_acceptance_check.pytiered_context_injector.py
These are the scripts used in the validated production deployment.
How the Sidecar Works
1. Session Intake
The agent writes state.db and session JSON files normally.
The sidecar reads them incrementally and tracks progress with a checkpoint.
2. Long-Term Archive
session_to_gbrain.py converts high-value sessions into gbrain pages, applies tags, writes timeline entries, and links sessions to topic hubs.
3. Governance Rebuild
memory_governance_rebuild.py rebuilds:
- session indexes (FTS5)
- hindsight indexes
- memory hubs (topic-based theme aggregators)
- canonical memory objects with multi-version status (
active/superseded) and time validity (valid_from/valid_to) - conflict groups for deduplication
- dossier metadata
- recall metrics
- vector embeddings (when
EMBEDDING_API_URLis configured)
It also maintains repair infrastructure:
orphan_messages— orphan message audit trailsession_repair_map— message-to-session repair mappingsession_lineage_repair— session parent-chain repairrecovered_fragments— unassignable memory fragment archivememory_aliases/memory_relations— alias and relation graphsessions_effectiveview — repaired session view layer
4. Layered Retrieval
tiered_context_injector.py classifies the query intent and fuses:
- hub summaries (topic-level)
- canonical objects (fact-level, with multi-version status filtering)
- hindsight cache (pre-indexed hindsight memories)
- live hindsight (when policy says it should be used)
- semantic search (when vector index is available)
- weak fallback layers only when necessary (FTS5 / LIKE / semantics)
5. Health and Remediation
memory_guardian.py reports health, trend data, duplicate counts, sync lag, and consolidation backlog signals.
It includes safe remediation logic for sticky consolidation backlogs and stuck operation detection.
Focused Dossiers
v3.0 introduces the Focused Dossier concept.
A dossier is a first-class memory profile for an important person, relationship, project, event, or topic.
The production deployment includes a validated relationship dossier , and the shared registry supports extending to more dossiers.
Embedding Model Selection
Embedding models enable semantic vector search as an additional retrieval layer in L3 recall.
When EMBEDDING_API_URL is set, the governance rebuild automatically generates 384–1024 dimensional embeddings for each active memory_object and stores them in the canonical_semantic_index table. During recall, tiered_context_injector.py can query this index via cosine similarity alongside keyword-based FTS5 and LIKE paths.
How it affects retrieval quality
- semantic recall quality: vectors capture meaning beyond keyword overlap
- cross-lingual matching: Chinese queries can match English content and vice versa
- dossier clustering: objects about the same topic are grouped even when wording differs
- fallback frequency: richer semantic index reduces reliance on weak LIKE / FTS5 fallbacks
Deploying an embedding server
The sidecar does not bundle an embedding server. You run one independently and point the sidecar to it via EMBEDDING_API_URL.
Quick start with sentence-transformers (recommended for development):
pip install sentence-transformers flask
Create a minimal server that serves the OpenAI-compatible /v1/embeddings endpoint.
A reference implementation is included in the community scripts:
# embedding_server.py (example — serve with your chosen model)
from sentence_transformers import SentenceTransformer
from http.server import HTTPServer, BaseHTTPRequestHandler
import json
model = SentenceTransformer("intfloat/multilingual-e5-small")
class Handler(BaseHTTPRequestHandler):
def do_POST(self):
length = int(self.headers.get("Content-Length", 0))
body = json.loads(self.rfile.read(length))
texts = body.get("input", [])
emb = model.encode(texts, normalize_embeddings=True).tolist()
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(json.dumps({"data": [{"embedding": e} for e in emb]}).encode())
HTTPServer(("127.0.0.1", 8766), Handler).serve_forever()
Then set the environment variable and run a governance rebuild:
export EMBEDDING_API_URL=http://127.0.0.1:8766/v1/embeddings
python3 $AGENT_HOME/scripts/memory_maintenance_cycle.py
When EMBEDDING_API_URL is not set, the sidecar runs entirely without embeddings — all text-based retrieval (FTS5 / LIKE / hindsight / gbrain) continues to work normally.
How model selection works during install
During installation, the installer either:
- prompts for a model interactively, or
- accepts
--embedding <model-id>, or - uses the recommended default in non-interactive mode
The selected model is recorded in install-profile.json as metadata.
It does not automatically deploy the model — you must run the embedding server with the chosen model yourself.
Supported models
| Model | Languages | Dimension | Size | Best for |
|---|---|---|---|---|
intfloat/multilingual-e5-small |
100+ languages | 384d | ~470MB | Recommended default for mixed Chinese/English deployments |
BAAI/bge-small-zh-v1.5 |
Chinese focused | 512d | ~96MB | Lowest-resource Chinese-first deployment |
paraphrase-multilingual-MiniLM-L12-v2 |
50+ languages | 384d | ~471MB | Mature multilingual sentence-transformers ecosystem |
Alibaba-NLP/gte-multilingual-base |
75+ languages | 768d | ~610MB | Higher multilingual recall quality |
sentence-transformers/LaBSE |
109 languages | 768d | ~471MB | Cross-lingual alignment-heavy workloads |
BAAI/bge-m3 |
100+ languages | 1024d | ~2GB | Maximum quality when hardware is generous |
Recommended default
intfloat/multilingual-e5-small
Why:
- strong multilingual coverage (100+ languages)
- good enough quality for production memory recall
- moderate resource cost (~470MB RAM)
- safe default for mixed Chinese / English workloads
Use BAAI/bge-small-zh-v1.5 only when the deployment is overwhelmingly Chinese and resource-constrained (96MB).
Choosing Your Retrieval Engine
In v3.0, "retrieval engine" is not a single database choice.
It is the retrieval profile that decides how the sidecar prioritizes evidence layers.
The production profile: Hybrid Sidecar
This repository ships one maintained deployment profile:
- Hybrid Sidecar (recommended)
It combines:
| Layer | Source | Role |
|---|---|---|
| L1: Recent sessions | state.db sessions table |
Immediate context |
| L2: FTS5 + LIKE search | state.db messages_fts / messages / sessions |
Keyword-based session retrieval |
| L3: Governance objects | memory_governance.db (FTS5) |
Canonical long-term memory with multi-version filtering |
| L3: Hindsight cache | memory_governance.db hindsight_index |
Pre-indexed Hindsight memories |
| L3: Memory hubs | memory_governance.db memory_hubs |
Topic-level theme aggregators |
| L3: Semantic vectors | canonical_semantic_index |
Cosine similarity search (when EMBEDDING_API_URL is configured) |
| Live Hindsight API | Hindsight HTTP API | Real-time fact recall (when policy triggers) |
| Fallback: semantics | semantics.db |
LIKE-based embedding content search |
| Fallback: archives | state.db archives_fts |
FTS5 over archived session summaries |
All layers are fused via RRF (Reciprocal Rank Fusion) with intent-aware re-ranking.
How retrieval adapts to query intent
| Need | Dominant layers |
|---|---|
| Current system / provider state | governance objects + system hub |
| Relationship memory | dossier hub + live hindsight + hindsight cache + semantic |
| Project delivery | canonical project objects + hindsight cache |
| Broad exploration | wider governance/object evidence, limited fallback |
| Cold archive lookup | gbrain session pages + topic hubs |
| Recent conversation | L1 recent sessions + L2 FTS5 |
Why "engine swapping" was dropped
Older drafts described the project as if you could freely swap PostgreSQL, Elasticsearch, SQLite, and other engines.
That was not the final production reality. The validated system is:
- sidecar-first
- agent-agnostic (AGENT_HOME-based)
- Hindsight-backed
- gbrain-archived
- governance-indexed
- semantically-enhanced (optional vector index)
This narrower definition makes the repository cleaner, more maintainable, and reliably redeployable.
Operational Workflow
Agent writes new sessions
-> session_to_gbrain.py ingests archive candidates
-> memory_governance_rebuild.py refreshes objects / hubs / metrics / vectors
-> memory_guardian.py checks backlog and health
-> tiered_context_injector.py generates layered recall artifacts
-> Agent consumes the resulting context when needed
Validation Workflow
For production changes:
- develop locally
- compile locally
- back up server scripts
- deploy to
$AGENT_HOME/scripts/ - run
memory_maintenance_cycle.py - run
sidecar_acceptance_check.py - confirm live agent regression queries still behave correctly
Repository Layout
installer/ install entrypoints, config patch helpers, environment checks
scripts/ final sidecar runtime scripts (7 supported scripts)
skills/ agent-side memory skills
templates/ archive / skill templates
tests/ import and smoke validation for the repository
Acknowledgements
Core projects and ecosystems
- Hermes Agent — the original agent that this sidecar was built alongside
- Hindsight — short-to-medium term memory graph
- gbrain — personal knowledge graph engine
- sentence-transformers — embedding model framework
- OpenCode — intelligent coding assistant that guided the design
- PostgreSQL — gbrain backing store
- pgvector — vector extension for PostgreSQL
- SQLite — state.db and governance.db backing store
- FTS5 — full-text search engine for session and object indexes
Embedding model providers
- intfloat/multilingual-e5-small
- BAAI/bge-small-zh-v1.5
- paraphrase-multilingual-MiniLM-L12-v2
- Alibaba-NLP/gte-multilingual-base
- sentence-transformers/LaBSE
- BAAI/bge-m3
Community feedback
Thanks to the users who reported edge cases, memory misses, multilingual recall problems, sticky consolidation signals, and operational issues through:
- GitHub Issues — bugs, feature requests, and architecture discussions
- GitHub Discussions — design reviews and deployment questions
- Reddit — r/LocalLLaMA, r/MachineLearning, and other communities
- V2EX — Chinese-language user feedback and problem reports
- Direct server-side production feedback — Hermes users who shared real-world recall misses and performance data
Those reports materially shaped the final v3.0 sidecar design — from the initial 4-layer architecture through multi-agent support, conflict-group deduplication, multi-version status, time validity, and the optional vector index.
License
This project is provided for reference and deployment use.
See individual dependencies for their respective licenses.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found