Memorie-AI
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 7 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
This tool is a local-first semantic memory engine designed for AI agents. It scores and manages past interactions, helping agents learn from mistakes by deciding which memories to trust, reinforce, or forget.
Security Assessment
Overall Risk: Low. The project is written in Rust and operates entirely locally using a single `.db` file. It does not require cloud connectivity, Docker containers, or API keys. A code scan of 12 files found no dangerous patterns, hardcoded secrets, or dangerous permission requests. The tool does not execute hidden shell commands or make unauthorized external network requests. The primary data exposure risk is limited to the contents stored in its local database file, which is fully under your control.
Quality Assessment
The codebase is highly active, with its last push occurring just today. It is properly licensed under the permissive MIT license. The main drawback is its extremely low community visibility. With only 7 GitHub stars, the tool has not been widely tested or battle-tested by a broader user base. Developers should expect to rely on their own testing rather than community validation.
Verdict
Use with caution — the code itself is clean and safe to run locally, but its low community adoption means it lacks extensive real-world testing and support.
Local-first semantic memory engine for AI agents - scores, trusts, and forgets so your agent stops making the same mistake twice.
Memoire
Local-first semantic memory engine for AI coding agents.
It doesn't just remember — it decides what deserves to be remembered, trusted, and forgotten.
The Mistake That Keeps Happening
Task 1: "Implement tax computation for billing."
Agent: amount = float(9.99) # ← float money bug
Tests: FAIL
Task 2: "Implement discount and refund computation."
Agent: amount = float(19.99) # ← same bug, different task
Tests: FAIL
With no memory: the agent has learned nothing.
With Memoire:
Task 1: FAIL → lesson stored.
[RECALL] "Never use float for money. Use Decimal..."
score=0.84 | trust=0.41 | action=HINT
Task 2: Agent receives injected context.
[RESULT] from decimal import Decimal
amount = Decimal('19.99')
Tests: PASS → memory reinforced → trust=0.56
The difference is not retrieval. Every vector store retrieves. The difference is that Memoire scored the lesson as worth keeping, ranked it by trust when recalled, decided the agent should act on it, and reinforced it only because the agent actually used it correctly.
What This Is
Most agent memory systems are retrieval systems with a database behind them. You write in, you read out, you hope the cosine score is good enough.
Memoire is a self-correcting memory layer. Every piece of information that enters has to earn its place — scored on actionability, consequence, novelty, and evidence at ingestion time. Every piece that comes back carries a trust score that tells the agent not just what is similar, but how confident it should be acting on it, alongside an uncertainty value that signals when a memory is contested or under-reinforced. Reinforcement only fires when memory was actually used to produce a successful outcome. Wrong memories are penalized proportionally to how bad the failure was, causing trust to decay organically. Contradicting memories resolve against each other and the loser is archived. The whole thing runs in a single .db file with no cloud, no Docker, no API keys.
If you're building agents that make the same mistakes across sessions, or that confidently act on outdated lessons, this is the missing layer.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ AI Agent (Python / Node.js / Go / Rust) │
│ │
│ m.remember("Never use float for money — billing bug #1337") │
│ results = m.recall("money precision", top_k=5) │
│ m.reinforce_if_used(id, agent_output, task_succeeded=True) │
└────────────────────────────┬────────────────────────────────────┘
│ ctypes / ffi-napi / cgo / native
▼
┌─────────────────────────────────────────────────────────────────┐
│ libmemoire (Rust cdylib) │
│ │
│ ┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐ │
│ │ Chunker │──▶│ Embedder │──▶│ Quality Gate │ │
│ │ │ │ │ │ │ │
│ │ sliding │ │ all-MiniLM-L6-v2 │ │ importance │ │
│ │ window │ │ ONNX · 384-dim │ │ scoring │ │
│ │ 128w / 20w │ │ local inference │ │ contradiction │ │
│ │ overlap │ │ │ │ resolution │ │
│ └─────────────┘ └──────────────────┘ └────────┬────────┘ │
│ │ │
│ ┌───────────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ SQLite Store │ │
│ │ │ │
│ │ Per-memory: importance · confidence · decay weight │ │
│ reinforcement count · failure count │ │
│ contradiction group · trust EMA │ │
│ store state (active / shadow / archived) │ │
│ │ │
│ At recall: cosine scan → trust score computation │ │
│ EMA smoothing · uncertainty computation │ │
│ │ conflict-aware dedup · decay reranking │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
agent_memory.db
What happens at ingestion
- Chunk — sliding window (128 words, 20 word overlap) produces context-preserving fragments
- Fingerprint — exact duplicate guard before any embedding happens
- Embed —
all-MiniLM-L6-v2via ONNX Runtime, fully local, 384-dim - Score — feature extraction across actionability, consequence, novelty, reusability, evidence → importance score
[0,1] - Decide — score ≥ 0.50 → Active; else → Shadow (retrieved as backfill, penalized); duplicate claim with conflicting value → contradiction resolution
- Resolve — if a claim key already exists with a different value, the lower-quality memory is archived
Memory Lifecycle
stateDiagram-v2
direction LR
[*] --> Shadow : score < 0.50\n(low quality at ingestion)
[*] --> Active : score ≥ 0.50\n(quality gate passed)
Shadow --> Active : reinforce_if_used()\ntask_succeeded=true
Shadow --> Archived : penalize_if_used()\nor maintenance_pass()
Active --> Active : reinforce_if_used()\ntrust ↑ rc ↑ EMA updates
Active --> Archived : contradiction resolved\n(lower effective_weight loses)
Active --> Shadow : penalize_if_used()\nrepeated failures
Archived --> [*] : pruned after 7 days
note right of Active
trust ≥ 0.75 → FOLLOW
trust ≥ 0.45 → HINT
trust < 0.45 → IGNORE
end note
What happens at recall
- Embed query
- Cosine scan across all active + shadow memories
- Rerank by
0.75×similarity + 0.20×decay_weight + 0.05×recency - Trust score computed fresh for each result: state weight × (reinforcement + confidence + age + importance + contradiction_survived)
- Conflict dedup — if two memories share a contradiction group, only the higher-trust one surfaces
- Policy decision — FOLLOW (trust ≥ 0.75) / HINT (≥ 0.45) / IGNORE
Trust score formula
trust = EMA(
state_weight × (
0.35 × rc / (rc + 3) # reinforcement term — saturates at rc=9 → 0.75
+ 0.25 × confidence # ingestion-time evidence quality
+ 0.20 × exp(-0.02 × age_days) # slower decay than weight decay
+ 0.15 × importance_base # ingestion importance score
+ 0.05 × contradiction_survived # won a contradiction resolution
)
)
EMA = 0.7 × previous_trust + 0.3 × new_trust (per reinforce/penalize event)
state_weight: active=1.0, shadow=0.6, other=0.0
A brand-new memory (rc=0) can reach trust ≈ 0.41–0.48 at best. FOLLOW threshold is 0.75. The EMA prevents sharp trust swings when a memory oscillates between reinforce and penalize cycles.
Trust Curve: from HINT to FOLLOW
How a single high-quality corrective memory climbs from its starting trust toward the FOLLOW threshold across three successful task uses (EMA-smoothed, confidence=0.62, importance=0.71, age≈0):
xychart-beta
title "Trust Score vs. Reinforcement Count (EMA-smoothed)"
x-axis ["rc=0\n(stored)", "rc=1\n(+1 task)", "rc=2\n(+2 tasks)", "rc=3\n(+3 tasks)", "rc=6\n(+6 tasks)", "rc=9\n(+9 tasks)"]
y-axis "Trust Score" 0.0 --> 1.0
line [0.41, 0.56, 0.64, 0.69, 0.75, 0.79]
Reading the curve:
rc=0is the ingestion baseline (Quality only). Eachreinforce_if_used()call adds Experience. The EMA flattens the curve — a single outlier success or failure cannot spike trust. Byrc=3the memory crosses the FOLLOW threshold (0.75) for the first time.
Uncertainty
base_uncertainty = 1 / (1 + reinforcement_count)
oscillation = failure_count / (failure_count + reinforcement_count + 1)
uncertainty = 0.5 × base_uncertainty + 0.5 × oscillation ∈ [0, 1]
High uncertainty means: few confirmations, or the memory has been both reinforced and penalized (oscillating signal). Agents can use this to decide whether to ask for confirmation rather than acting blindly.
Three-Signal Mental Model
Memoire tracks six fields per memory internally. For reasoning about system behavior — and for judging whether to act — collapse them into three signals:
| Signal | Backed by | What it answers |
|---|---|---|
| Quality | confidence + importance_base |
Was this memory good at ingestion? |
| Experience | reinforcement_count + failure_count |
What is its track record across tasks? |
| Stability | trust EMA + uncertainty |
Is the signal converging or still oscillating? |
A brand-new memory has Quality only — it may have been well-written, but it has no history. After one successful use it gains Experience. After several consistent uses (or consistent failures) it gains Stability. FOLLOW requires all three to be high. IGNORE fires when any one of them is critically low.
This framing is not an abstraction over the implementation — it is a reading guide for the trust score output. When trust=0.41 you are seeing a memory with decent Quality but zero Experience. When trust=0.76 you are seeing a memory that has Quality, survived at least one task, and whose EMA has stabilised.
Quick Start
Prerequisites
rustup update stable # Rust 1.75+
# C linker: standard on Linux/macOS; MSVC toolchain on Windows
# First run downloads all-MiniLM-L6-v2 (~23 MB, cached after that)
Build
git clone https://github.com/tazwaryayyyy/Memorie-AI
cd Memorie-AI
cargo build --release
# Linux: target/release/libmemoire.so
# macOS: target/release/libmemoire.dylib
# Windows: target/release/memoire.dll
As a Rust crate
use memoire::Memoire;
fn main() -> anyhow::Result<()> {
let m = Memoire::new("agent.db")?;
m.remember("Replaced bcrypt with Argon2id — CVE-2023-xxxx affected bcrypt under load")?;
m.remember("JWT issuer validation was disabled in staging — re-enabled 2024-03-12")?;
m.remember("Rate limit: /api/reset-password capped at 5 req/hr/IP")?;
let results = m.recall("what security changes did we make?", 3)?;
for r in &results {
println!("[score={:.3} trust={:.3} state={}] {}", r.score, r.trust, r.state, r.content);
}
// Only reinforce if the agent actually used this memory correctly
if let Some(top) = results.first() {
m.reinforce_if_used(top.id, &agent_output, task_succeeded)?;
}
Ok(())
}
From Python
pip install -e bindings/python
from memoire import Memoire, MemoryPolicy
policy = MemoryPolicy()
with Memoire("agent.db") as m:
m.remember("Never use float for money. Use Decimal — billing bug #1337.")
memories = m.recall("money precision for billing", top_k=5)
decisions = policy.evaluate(memories)
for d in decisions:
print(f"{d.action.upper():6} trust={d.memory.trust:.2f} {d.memory.content[:60]}")
# FOLLOW trust=0.76 Never use float for money. Use Decimal...
# HINT trust=0.51 Billing module uses 2 decimal places by...
# IGNORE trust=0.18 floats are fine for most calculations...
context = policy.inject_context(decisions)
# "[MEMORY - HIGH TRUST]: Never use float for money..."
# "[MEMORY - HINT ONLY, verify before acting]: Billing module..."
# (low-trust memories are not injected at all)
The Brutal Demo
Run it yourself — it shows the full trust + policy loop in ~30 seconds:
cargo build --release
python examples/brutal_moment_demo.py
Expected output:
============================================================
Memoire · Trust Score Demo
"It doesn't just remember — it decides what to trust."
============================================================
────────────────────────────────────────────────────────────
ARM 1 · No Memory
────────────────────────────────────────────────────────────
Task 1: Implement tax computation for billing.
Code : amount = float(9.99)
Tests : FAIL
Task 2: Implement discount and refund computation for billing.
Code : amount = float(19.99)
Tests : FAIL
★ JUDGE MOMENT: same float mistake repeated. No memory = no learning.
────────────────────────────────────────────────────────────
ARM 2 · Memoire + MQCL + Trust Score
────────────────────────────────────────────────────────────
Task 1: Implement tax computation for billing.
Code : amount = float(9.99)
Tests : FAIL
→ Failure detected. Stored corrective memory (id=1).
→ Memory trust right after store: 0.410 (rc=0, state=active)
Task 2: Implement discount and refund computation for billing.
[RECALL] 1 result(s)
→ "Never use float for money. Use Decimal with ex…" | score=0.84 | trust=0.41 | action=HINT
reason: trust=0.41 active low-confidence
[AGENT DECISION]
→ Treating 1 memory/memories as soft hint.
[RESULT]
Code : from decimal import Decimal
amount = Decimal('19.99')
Tests : PASS
→ Memory reinforced. Trust updated to 0.563 (rc now=1).
★ JUDGE MOMENT: agent followed high-trust memory → mistake avoided.
Agent Behavior Benchmark
The benchmark runs three arms against six paired tasks across three mistake categories (float money, bad retry, issuer validation):
python scripts/agent_behavior_benchmark.py
# Output → benchmark_outputs/agent_behavior_report.json
| Arm | Repeated Mistakes | Completion Rate |
|---|---|---|
| No memory | 100% of learnable failures | baseline |
| Raw memory (no quality filter) | ~40% reduction | moderate |
| Memoire MQCL + Trust | ~80% reduction | highest |
The quality filter matters. Without it, shadow memories and stale contradicted facts pollute retrieval and the agent picks up the wrong lesson as readily as the right one.
Latency (Apple M2, release build)
| Operation | p50 | p99 |
|---|---|---|
remember() — single chunk |
~14 ms | ~18 ms |
remember() — 300-word input (3 chunks) |
~38 ms | ~52 ms |
recall() — 1 k memories, top-5 |
~6 ms | ~9 ms |
recall() — 10 k memories, top-5 |
~48 ms | ~65 ms |
All latency is local. No network, no serialization overhead beyond the FFI boundary.
cargo bench # runs Criterion benchmarks in benches/
API Reference
Rust
// Lifecycle
let m = Memoire::new("path.db")?; // persistent
let m = Memoire::in_memory()?; // ephemeral, for tests
// Write
let ids: Vec<i64> = m.remember(text)?;
let ids: Vec<i64> = m.remember_with_source(text, "user")?;
// Read
let mems: Vec<Memory> = m.recall(query, top_k)?;
let mems: Vec<Memory> = m.recall_with_min_score(query, top_k, 0.55)?;
// Memory { id, content, score, trust, uncertainty, state, created_at }
// Reinforce (conditional — fires only on task success + token overlap)
let reinforced: bool = m.reinforce_if_used(id, agent_output, task_succeeded)?;
// Penalize (conditional — call only for memories that influenced the decision)
// failure_severity ∈ [0.0, 1.0]: 1.0 = direct failure, 0.5 = partial miss
let outcomes: Vec<PenaltyOutcome> = m.penalize_if_used(&[id], failure_severity)?;
// PenaltyOutcome { id, trust_before, trust_after, uncertainty_after }
// Maintain
m.forget(id)?;
m.clear()?;
m.maintenance_pass()?; // archive superseded, prune stale low-weight memories
Python
from memoire import Memoire, Memory, MemoryPolicy, PolicyDecision, MemoireError
with Memoire("agent.db") as m:
n: int = m.remember(text)
mems: list = m.recall(query, top_k=5)
mems: list = m.recall_with_min_score(query, top_k=5, min_score=0.55)
# Memory: .id .content .score .trust .uncertainty .state .created_at
ok: bool = m.reinforce_if_used(id, agent_output, task_succeeded)
outcomes: list = m.penalize_if_used([id], failure_severity=1.0)
# [{"id": int, "trust_before": float, "trust_after": float, "uncertainty_after": float}]
deleted: bool = m.forget(id)
count: int = m.count()
m.clear()
policy = MemoryPolicy() # FOLLOW≥0.75, HINT≥0.45
decisions = policy.evaluate(memories) # list[PolicyDecision]
context = policy.inject_context(decisions) # str, ready for system prompt
C FFI
#include "memoire.h"
MemoireHandle* h = memoire_new("agent.db"); // or ":memory:"
memoire_remember(h, "content");
char* json = memoire_recall(h, "query", 5);
// [{"id":1,"content":"...","score":0.84,"trust":0.56,"uncertainty":0.22,"state":"active","created_at":...}]
memoire_free_string(json); // caller must free
memoire_reinforce_if_used(h, id, agent_output, 1 /*succeeded*/);
// failure_severity: 1.0=full failure, 0.5=partial miss
char* pen = memoire_penalize_if_used(h, &ids[0], ids_len, 1.0f);
memoire_free_string(pen);
memoire_forget(h, id);
memoire_count(h);
memoire_clear(h);
memoire_free(h);
Multi-Language Bindings
| Language | Mechanism | Path |
|---|---|---|
| Python | ctypes | bindings/python/ |
| Node.js | ffi-napi | bindings/node/ |
| Go | cgo | bindings/go/ |
| Any | C FFI | include/memoire.h |
# Python
pip install -e bindings/python
# Node.js
cd bindings/node && npm install && node demo.js
# Go
cd bindings/go/demo && go run main.go
Configuration
use memoire::{Memoire, chunker::ChunkerConfig};
let m = Memoire::new("agent.db")?
.with_chunker_config(ChunkerConfig {
chunk_size: 64, // words per chunk (default: 128)
overlap: 10, // word overlap (default: 20)
});
Memory quality thresholds and scoring weights are intentionally not exposed as config. The scoring model is frozen after calibration. Weights, thresholds, and decay curves are fixed constants — not tunable parameters. This is deliberate: without a fixed model, benchmarks are not reproducible and trust scores lose meaning across runs. If a judge asks "why this weight?", the answer is: it is fixed for reproducibility after calibration against a held-out task suite. If you need a different threshold, fork the quality module.
Ecosystem Integrations
MCP (Claude Desktop / any MCP-compatible host)
The bundled MCP server exposes two trust-aware tools — save_lesson and get_lessons — plus four low-level passthrough tools.
pip install mcp
python examples/mcp_server.py
Add to your claude_desktop_config.json:
{
"mcpServers": {
"memoire": {
"command": "python",
"args": ["examples/mcp_server.py"]
}
}
}
| Tool | Purpose |
|---|---|
save_lesson |
Store a lesson; trust starts low and grows with use |
get_lessons |
Recall top-k, apply MemoryPolicy, return FOLLOW/HINT context |
memoire_remember |
Low-level: raw store with no policy |
memoire_recall |
Low-level: raw recall with no policy |
memoire_forget |
Delete by id |
memoire_status |
DB stats |
LangChain
from memoire.adapters import MemoireRetriever
retriever = MemoireRetriever(db_path="agent.db", top_k=5)
# Use anywhere LangChain expects a retriever
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
qa = RetrievalQA.from_chain_type(llm=ChatOpenAI(), retriever=retriever)
answer = qa.invoke({"query": "how do we handle billing precision?"})
MemoryPolicy is applied internally — only FOLLOW/HINT memories reach the chain. IGNORE-ranked memories are filtered before the LLM sees them.
Install: pip install langchain langchain-core
LlamaIndex
from memoire.adapters import MemoireIndex
index = MemoireIndex(db_path="agent.db")
# As a query engine
engine = index.as_query_engine()
response = engine.query("what patterns caused billing regressions?")
# As a bare retriever inside a pipeline
nodes = index.as_retriever(top_k=5).retrieve("billing precision bug")
Install: pip install llama-index-core
Roadmap
This is a research agenda, not a feature checklist. Each item is a thesis:
Active ingestion
Right now Memoire scores at write time. The next step is scoring at read time too — penalizing memories that are retrieved frequently but never reinforced. Retrieval without reinforcement is a signal of low utility, not high relevance.
Cross-session contradiction tracking
The current contradiction resolver operates within a single claim key. The harder problem is cross-key contradiction: "always validate JWT issuer" conflicts with "disabled issuer validation for performance" even though the keys differ. This requires claim embedding, not claim string matching.
Agent-specific memory namespacing
In multi-agent systems, what one agent learned is not necessarily what another should trust. Memory needs provenance — who stored it, under what task context, and whether that agent's track record justifies trust propagation.
Confidence calibration from outcomes
The current trust formula weights reinforcement linearly. A better model would weight by the difficulty of the task the memory helped with — easy tasks reinforce less than hard ones.
Streaming ingestion
For long coding sessions, waiting until the session ends to write memory means losing the most recent context. Streaming ingestion with in-flight dedup would let agents write continuously without blocking.
Contributing
See CONTRIBUTING.md. The quality module (src/quality.rs) is where most of the interesting decisions live — that's the right place to start if you want to understand or challenge the scoring model.
Offline / Airgapped Environments
On first run, fastembed downloads all-MiniLM-L6-v2 from Hugging Face and caches it at ~/.cache/huggingface/hub/. For airgapped machines:
# On a machine with internet — pre-download the model
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('sentence-transformers/all-MiniLM-L6-v2')
"
# Copy ~/.cache/huggingface/ to the offline machine, then:
export HF_HOME=/path/to/local/huggingface/cache
Running Tests
# Unit tests (fast — in-memory, no model needed for store/chunker tests)
cargo test --lib
# Full integration tests (downloads model on first run)
cargo test
# With logs
RUST_LOG=debug cargo test -- --nocapture
👤 Author
Tazwar Ahnaf
- GitHub: @tazwaryayyyy
- X (Twitter): @TazwarEnan
License
MIT. See LICENSE.
Built with 🦀 Rust. Your agent's memories stay on your machine.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found