Kimetsu

Give your coding agent a memory that gets sharper every run.

Kimetsu (鬼滅), "demon slayer." It slays the demon every agent fights: amnesia.

Kimetsu demo: one-command setup, selftest, record a lesson, retrieve it by meaning

Why Kimetsu

LLM coding agents are brilliant and forgetful. Every session starts from zero:
the same wrong turns, the same re-explaining of your conventions, the same
expensive exploration you already paid for last week.

Kimetsu fixes the forgetting. It's a sidecar brain, a single Rust binary that
runs next to your host agent over MCP (Claude Code, Codex, Pi, OpenClaw, Cursor,
Gemini CLI) or as its own terminal chat. It learns which memories the model
actually used to win, and lets that knowledge compound across runs.

It remembers. Project conventions, failure patterns, the exact command
that regenerates your schema. Captured once, retrieved automatically.
It learns what helps. Memories the model cites before solving a problem
get promoted. Silent passengers and stale advice decay and get pruned.
It never explores twice. A session-start digest and an episodic resume
mean the agent's first turn already knows the repo and what you were doing
last time. No re-deriving the basics, no "where was I."
It answers, not just injects. kimetsu ask composes a grounded, cited
answer from memory using a local model: zero frontier tokens, works offline.
Lessons cited often enough graduate into runnable skills.
It's cheap to be right. On a recorded 16-task Terminal-Bench slice, runs
with Kimetsu cost about 13x less per win than the no-brain baseline ($0.19 vs
$2.47), and the ROI ledger shows the token savings on your own work.
It gets smarter, not just bigger. Semantic retrieval finds the right
memory even when you used different words, and it self-tunes retrieval against
your own query history.
It's yours, on your machine. The whole brain is one SQLite file per
project. No external vector DB, no cloud, no telemetry. Back it up with cp.

How it works

  Host agent (Claude / Codex / Pi / OpenClaw / kimetsu chat)
       │  asks for context                    ▲ cites what helped
       ▼                                      │
  MCP tools ──► Broker ──► top memories ──► agent run
                  │  scores candidates by relevance ×
                  │  usefulness × freshness × scope
                  ▼
  brain.db: one SQLite file, FTS5 + semantic ANN (usearch HNSW)

Before a task, the broker walks your project brain and your
cross-project user brain, scores every candidate, and injects the top few
inside an adaptive token budget. The semantic build matches by meaning
(O(log N) ANN, scaling to ~1M memories in ~3 GB RAM, sub-2s retrieval).
While it works, Kimetsu surfaces known pitfalls before the first
attempt, and the model cites the memories that actually help.
After the task, cited memories get promoted, unused advice decays on a
half-life curve, and non-trivial sessions auto-harvest their lessons.

Full mechanics, scoring, citations, decay, conflict detection, and the daemon
are in How Kimetsu Works.

Benchmarks

Every number is reproducible with kimetsu brain bench and the
kimetsu brain roi ledger.

Metric	Result	How it's measured
Cost per win	$0.19 vs $2.47 (~13x cheaper)	16-task Terminal-Bench slice, Kimetsu vs no-brain baseline
Retrieval quality	recall@4 0.949, MRR 0.914 at ~138 ms (default), up to 0.975 / 0.933	`kimetsu brain bench`, 100-memory / 210-case dataset, jina-v2-base-code + cross-encoder rerank
Scale	~1M memories in ~3 GB RAM, sub-2s retrieval	usearch HNSW ANN, O(log N)
Footprint	one SQLite file per project, no cloud, no telemetry	back it up with `cp`

The semantic build retrieves with jina-v2-base-code and a cross-encoder
reranker, tuned with kimetsu brain bench on a 100-memory / 210-case dataset of
real exported memories. The latency-optimized default (ms-marco-tinybert-l-2-v2)
lands recall@4 0.949, MRR 0.914 at ~138 ms; the quality-best rerankers reach
recall@4 0.975, MRR 0.933. Swap embedder and reranker with one config key each
and re-judge on your own corpus. Full grid in
How Kimetsu Works.

Quickstart

npm install -g kimetsu-ai
kimetsu npm-flavor embeddings        # one-time: enable semantic retrieval
cd /your/project
kimetsu setup --host claude-code     # or: codex | openclaw | pi
kimetsu doctor --selftest            # records a memory and retrieves it

Other install paths (cargo, prebuilt archives) and host-wiring details are in
docs/INSTALL.md.

Command reference

Command	What it does
`kimetsu setup --host <h>`	Wire the brain into a host agent (init + install + selftest)
`kimetsu chat`	Standalone terminal coding assistant with the same brain
`kimetsu brain memory add`	Record a durable lesson by hand
`kimetsu brain context "<q>"`	Broker-ranked context bundle for a query
`kimetsu ask "<q>"`	Grounded, cited answer from memory (local model)
`kimetsu resume` / `kimetsu checkpoint`	Pick up where the last session left off
`kimetsu brain skills`	Turn often-cited lessons into runnable skills
`kimetsu brain insights` / `roi`	Is the brain helping, and did it pay for itself
`kimetsu brain tune`	Self-tune retrieval against your own query history
`kimetsu brain sync`	Replicate your brain across machines, no server
`kimetsu brain bench`	Benchmark retrieval on your own corpus

The full command surface, configuration keys, and maintenance commands are in
How Kimetsu Works and
docs/INSTALL.md.

Kimetsu Remote (beta)

Share one brain per repository from a server over HTTP MCP, for a team or for
yourself across machines:

# server
kimetsu-remote serve --addr 0.0.0.0:8787 --data /srv/kimetsu-brains --token <secret>
# each client
kimetsu plugin install claude-code --remote https://kimetsu.example.com:8787

Bearer auth, per-repo brains, an optional shared org-brain, server-side repo
ingest, TLS, Prometheus metrics, and a server-side reranker. Full setup in
docs/REMOTE.md.

What's in the box

Component	What it is
The brain	Durable project + user memory in one auto-migrating SQLite file: FTS + semantic retrieval, citations, decay, conflict detection, self-tuning, and effectiveness analytics.
`kimetsu ask` + warm-start	Grounded answers from memory, and a session-start digest plus episodic resume so the first turn already knows your work.
`kimetsu chat`	A full terminal coding assistant running against your workspace.
MCP sidecar	`kimetsu mcp serve` exposes the brain to any MCP host as `kimetsu_*` tools.
Kimetsu Remote (beta)	The brain over HTTP MCP, one per repository, shared from a server.

Built as a small Rust workspace. Lint and tests run clean on every change.

Docs

Install & host wiring: every install path, host
wiring, auto-harvest and distiller setup, maintenance commands.
How Kimetsu Works: the brain, the broker,
citations, decay, conflict detection, the MCP surface, retrieval models and
benchmarking, configuration, the bridge, and doctor.
Local models: run fully local with Ollama.
Kimetsu Remote: server setup, org brain, TLS, clients.
CHANGELOG: what shipped in each release.

License

Dual-licensed under MIT or Apache-2.0,
your choice.