lethe

mcp
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 6 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

The best-benchmarked open-source AI memory system and the first AI memory built to forget.

README.md

Lethe

The first AI memory built to forget.

local-first  ·  97.4% R@5 on LongMemEval raw  ·  zero API calls

PyPI tests python MCP license



Every memory framework right now is racing the same direction.

Mem0 promises perfect recall. MemPalace promises verbatim retention.
Letta hands an LLM the whole context and asks it to manage itself. The
benchmarks they all compete on — recall@K, hit-rate, MRR — measure one
thing: how rarely does your agent lose a fact?

But agents in production don't die of losing facts. They die of keeping
them. The password rotated three months ago, still suggested. The
customer who exercised right-to-deletion, still in the recommender. The
job title wrong since 2023 because the supersede never landed. The OTP
from last Tuesday, permanently embedded next to a real preference.
Memory systems fail by overgrowth, not by attrition. And no one is
benchmarking that side.

The Greeks had a name for the missing operation.

Lethe (Λήθη) — one of the five rivers of Hades. Souls drank from
it before reincarnation, leaving the former life behind.

The Greek word for truthἀλήθεια / aletheia — derives from
lethe itself.

Memory is what survives Lethe. Truth is what survives memory.


The model

In the myth, Lethe is a river — surface, current, bed. Everything in
the water has a depth. A leaf floats; a stone sinks; some things are
weighed down enough to disappear.

Graph stores answer what is connected to what. Vector stores answer
what is semantically similar. Neither answers the question agent
memory actually faces: how deep is this fact, right now?

We built the simplest mental model that fits: every fact has one
number — depth. Every operation is a force on it.

depth     state                          how it got there
─────────────────────────────────────────────────────────────
+∞        pinned, immune to gravity      .pin()
= 1.0     just inscribed, on surface     .inscribe()
∈ (0, 1)  sinking under gravity          .consolidate()
= 0       submerged, present but mute    .surrender(mode="release")
< 0       erased from disk               .surrender(mode="purge")
─────────────────────────────────────────────────────────────

No weight. No alive flag. No superseded_at column. One number,
one axis, one mental model.


Benchmarks

Two axes. The conventional one: can a memory system find a fact when
you need it?
The one we propose: can it let go of a fact when you
ask?
Most frameworks score on the first; Lethe scores on both.

LongMemEval-S — retrieval (the conventional axis)

500 questions on MemPalace's own evaluation methodology, same
all-MiniLM-L6-v2 embedder, zero API calls.

System R@1 R@5 R@10 Wall
MemPalace (raw) 80.6% 96.6% 98.2% 12 min
Lethe v1 85.4% 97.4% 99.0% 14 min

Lethe leads at every K; the gap is 6× wider at R@1 than at R@5
(+4.8 pp vs +0.8 pp). A single depth axis beats a palace of
wings, rooms, and drawers — most clearly where it matters: at #1.

ForgetEval — forgetting (the axis we propose)

1000 generated cases across five families — supersession,
decay, amnesia, purge, drift — each probing one
structural property a memory system must exhibit to be safe in
production. Pass / fail is exact substring matching on top-k recall,
no LLM judge, deterministic. Full methodology:
docs/forgeteval.md.

System super decay amnesia purge drift Overall
Lethe v1 100% 100% 98% 100% 99% 99.3% (993 / 1000)
Mem0 (2.0.2) 100% 100% 70% 75% 100% 88.8%
MemPalace 0% 0% 0% 0% 0% 0% (no forgetting primitives)

Mem0 ties on supersession / decay / drift but breaks at the precision
operations: forgetting one entity without bleeding into near-neighbors
(amnesia 70%) and deleting by identifier without over-pruning siblings
(purge 75%). MemPalace's zeros are not a benchmark failure — they
are an honest report that the library was built without supersede,
release, or purge.

In production this maps to real failures. 70% amnesia means three
in ten "forget this user" requests leave fragments reachable to
other queries — a GDPR liability and a stale-context bug. 75%
purge
means one in four deletions either miss the target or take a
neighbor with them — the silent delete-by-similarity failure that
bricks compliance audits. MemPalace's 0% is the opposite failure:
a GDPR Article 17 right-to-be-forgotten request becomes a manual
data-migration project, not a one-line API call.

Reproduce: py bench/forgeteval/run.py --adapter {lethe|mem0|mempalace} --scale 200

ForgetEval is downstream of the depth model — and the depth model
is downstream of ForgetEval. A failing purge_gdpr case in early
runs forced recall(lexical=True) into the core as a first-class
primitive. Both tables above reflect that loop.


Architecture

  • One physical axis: depth. Every state — pinned, surfaced,
    sinking, submerged, erased — is a numeric region. No status flags.
  • Single SQLite file. Three sub-tables (memory, memory_vec,
    memory_fts) keyed by shared rowid; plus an append-only event
    log and a supersession edge table. No external services.
  • Two retrieval primitives. recall(query) is RRF-blended vec +
    BM25; recall(query, lexical=True) is pure BM25. Purge uses the
    second — deleting [email protected] is a lexical lookup by
    identifier, not a semantic search for "similar customers."
  • Verifiable forgetting. Every signed purge returns an
    Ed25519-signed receipt anchored to a Merkle root over the event
    log. Tamper with any past event afterwards → receipt fails
    verification. No other open-source memory framework can produce
    this proof because none of them keep the log to anchor to.
  • Time-travel built in. recall(query, at=T) reconstructs depth
    state at any past timestamp from the event log.

Quickstart

pip install "pylethe[embed,crypto,mcp]"

The PyPI distribution name is pylethe (the lethe slot was already
taken on PyPI by an unrelated package); the import remains
from lethe import Lethe.

Library:

from lethe import Lethe

agent = Lethe("./agent.db")
mid = agent.inscribe("Alice works at Anthropic.")

agent.surrender(mid, mode="release")            # depth → 0
agent.surrender({"old": mid, "new": "Alice now at OpenAI."},
                mode="supersede")               # old sinks, new surfaces
agent.surrender(mid, mode="purge")              # erased from disk
agent.pin(mid)                                  # depth → +∞

CLI — one subcommand per primitive:

lethe inscribe "Alice works at Anthropic."
lethe recall "Where does Alice work?"
lethe supersede 1 --new "Alice now at OpenAI."
lethe blame "Alice's job"
lethe consolidate
lethe ingest ~/notes                       # batch: *.md *.txt *.rst

# Verifiable purge
lethe keygen
lethe --db agent.db purge --signed 42      # emits receipt JSON
lethe verify-receipt receipt.json --db agent.db --db-check

DB defaults to ~/.lethe/agent.db. Pass --json on any subcommand for
machine-readable output.

MCP — eleven tools exposed over stdio (every core operation plus
signed-purge receipts). Add to Claude Desktop / Claude Code / Cursor:

{
  "mcpServers": {
    "lethe": {
      "command": "python",
      "args": ["-m", "lethe.mcp_server"],
      "env": {"LETHE_DB": "/absolute/path/to/agent.db"}
    }
  }
}

Recipes

Runnable cookbook in recipes/ for the common patterns:
OTP TTL, GDPR purge with cryptographic receipt, belief
revision via supersession
, pinning user preferences, and
time-travel debugging. Each recipe is a self-contained ~40-line
script that runs without fastembed — python recipes/02_gdpr_purge_receipt.py.


Status

v1.0.0-alpha. Core implemented and tested:

$ pytest tests
14 passed in 0.65s

Roadmap (next):

  • Human-curated adversarial ForgetEval. Substring traps, prefix
    collisions, paraphrase chains. Template-generated 1000-case is the
    floor, not the ceiling.
  • Receipt-verification benchmark family. Does the system produce
    auditable proof of deletion? A new ForgetEval axis no other
    framework even attempts.
  • Adaptive consolidation policies. consolidate() uses one fixed
    decay law; we want per-domain policies — financial records decay
    slower than chat memory.
  • Production-density distractor corpora. Synthetic office-trivia
    fillers replaced with real long-form text (Wikipedia, code, emails)
    for a tougher recall environment.
  • Pluggable retrieval backends. A Backend protocol so the
    default SQLite + vec0 + FTS5 stack can be swapped for Postgres
    • pgvector, Pinecone, Weaviate, or a custom store. The depth axis
      and surrender / recall semantics stay identical — only the storage
      layer changes.
  • Optional LLM hooks at inscribe and consolidation time. Entity
    extraction at inscribe, semantic deduplication, and LLM-guided
    consolidation policies (which facts to promote, which to release).
    The recall path stays LLM-free — determinism and latency
    are non-negotiable on the hot path.

Paper

📄 ForgetEval: Benchmarking the Forgetting Axis of Agent Memory Systems — Dongxu Yang, DeepLethe, 2026.

Full methodology, formal model of the depth axis (Propositions 1–4), 1000-case ForgetEval, 5-seed variance, distractor sweep, component ablations, and LongMemEval-S comparison. 23 pages, MIT.

@misc{yang2026forgeteval,
  author       = {Yang, Dongxu},
  title        = {ForgetEval: Benchmarking the Forgetting Axis of
                   Agent Memory Systems},
  year         = {2026},
  howpublished = {\url{https://github.com/deeplethe/lethe/blob/main/paper/paper.pdf}},
}

Star History

Star History Chart

License

MIT — see LICENSE.

Yorumlar (0)

Sonuc bulunamadi