mcp-sophon
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Warn
- fs module — File system access in .github/workflows/ci.yml
- fs module — File system access in .github/workflows/mcp-registry.yml
- fs module — File system access in .github/workflows/release.yml
Permissions Pass
- Permissions — No dangerous permissions requested
This is a deterministic context compressor for MCP agents. It compresses prompts, memory, and code digests locally to save tokens and money when using AI models, operating without machine learning models or API keys at query time.
Security Assessment
Overall risk: Low. The tool does not request dangerous permissions and no hardcoded secrets were detected. The file system access warnings are strictly limited to automated GitHub Actions workflows (CI, release, and registry deployment), which is standard and safe behavior. Because it is a standalone Rust binary designed to run locally without a GPU or API keys, it inherently minimizes data exposure. It does not appear to execute unexpected shell commands or make unauthorized network requests during standard operation.
Quality Assessment
The project is highly maintained, with its last push occurring today. It is transparent about its performance metrics, linking to reproducible benchmark scripts, and is backed by a permissive MIT license. It boasts a solid test suite (387 Rust + 4 Python tests). However, community visibility and trust are currently very low. With only 5 GitHub stars, the tool has not yet been broadly vetted by the open-source community, meaning users are relying primarily on the developer's own testing and claims.
Verdict
Use with caution — the tool appears secure, efficient, and well-engineered with a safe CI setup, but its extremely low community adoption means it has not yet been widely battle-tested.
Deterministic context compressor for MCP agents. Slots in front of prompt caching, mem0, Letta, or Claude Code. Single Rust binary, zero ML at query time. +24% tokens / +49% $ saved on top of Anthropic prompt caching.
Sophon
Honest token economics for MCP agents. One Rust binary. Zero ML at query time. Reproducible benchmarks.
Sophon is a deterministic context layer for agents speaking the Model
Context Protocol. It compresses prompts, conversation memory, code
digests, file deltas, and shell output — without an embedding model at
query time, without a GPU, and without API keys. 7.2 MB default
Rust binary (25 MB with the optional 11-language tree-sitter AST
backend, 34 MB with BGE embedder), MCP-native, cl100k_base-accurate.
Every number below links to the reproducible benchmark script that
produced it. Every caveat is in BENCHMARK.md. Version
history + deprecated numbers live in CHANGELOG.md.
TL;DR — v0.5.0
Sophon is a deterministic context compressor that slots in front
of whatever memory / cache / code-nav layer you already use — not
instead of them. v0.5.0 is a positioning re-scope: we stopped chasing
LOCOMO conversational recall (mem0's territory) and doubled down on
pure compression. Full rationale in
CHANGELOG § 0.5.0.
New in v0.5.0 — orthogonal-stack economics
| Stack | Additional saved by Sophon | Benchmark |
|---|---|---|
| Sophon + Anthropic prompt caching | +24 % tokens / +49 % $ on a 25-turn Claude-3.5-Sonnet session | sophon_plus_prompt_caching.py |
| Sophon + mem0 | Depends on mem0 output size; the bench flags overhead on short dumps directly | sophon_plus_mem0.py |
New in v0.5.0 — single-binary efficiency
Four lines every Python-based context layer would struggle to match. All measured against the v0.5.0 release binary on macOS arm64.
| Metric | Value | Benchmark |
|---|---|---|
| Binary on disk | 8.7 MB (release) | stat on the release target |
| Cold start → ready | 10.6 ms p50, 25 ms p99 | cold_start_and_footprint.py |
| RSS after initialize | 12.5 MB | idem |
| Session scaling (1 → 200 turns) | update_memory 0.1 ms p50, flat; compress_history 4.2 ms p50 / 50 ms p99 |
session_scaling_curve.py |
compress_output coverage |
81.6 % weighted aggregate across 15 command families (git, cargo, docker, pytest, npm, kubectl, curl, tail, grep, …) | compress_output_per_command.py |
Pass --include-python-baseline to cold_start_and_footprint.py to contrast against python -c "import mem0" / sentence_transformers / langchain on your machine.
Carried over (still on-thesis, measured at v0.4.0 and unchanged)
| Use case | Metric | Compared to |
|---|---|---|
| Agent session token economics | 68.1 % tokens saved across 25-turn coding session (§ 1) | Baseline: raw tokens |
| Prompt compression | 70.2 % mean saved, 36 ms mean latency, 22 prompt shapes (§ 2) | LLMLingua-2: +8.9 pt at 35× lower latency (§ 6.1) |
| Code retrieval (repo QA) | recall@3 = 70 % on "where is X?" questions (§ 4) | grep: 10 % ; FULL context: 20 % |
| Latency + reliability | p99 < 87 ms on 5/7 ops, 100 % ok_rate on 190 runs (§ 3) | Sub-second guaranteed |
Protocol + DX changes in v0.5.0
- MCP protocol
2025-06-18— addsnotifications/cancelled,
structured JSON-RPC error codes (-32000..-32099reserved range
for Sophon server errors), and an infallible dispatcher so a
single malformed request can no longer kill the stdio loop. sophon doctor— read-only installation diagnostic: binary- resolved config + every
SOPHON_*flag in use + path
writability + LLM-command PATH probe + MCP-client config
hints. Also surfaces deprecated recall-chasing flags.
- resolved config + every
- Observability — 18
eprintln!replaced withtracing;
filter viaRUST_LOG=sophon=debug. - Tests — workspace count 303 (v0.4.0) → 405+.
What stopped being a goal
Long-form conversational recall above ~40 % on LOCOMO is now
explicitly out of scope. mem0 hits 91 % with neural retrieval; we
sit in front of mem0 instead. The v0.4.0 recall-chasing flags
(SOPHON_HYDE, SOPHON_FACT_CARDS, SOPHON_ENTITY_GRAPH,SOPHON_LLM_RERANK, SOPHON_ADAPTIVE, SOPHON_TAIL_SUMMARY,SOPHON_REACT, SOPHON_GRAPH_MEMORY, SOPHON_MULTIHOP_LLM) stay
functional but are flagged by sophon doctor as deprecated and
will be removed.
The three pillars
1. Measured economies, not promised ones
- 68.1 % session tokens saved over a 25-turn coding session
(§ 1) - 70.2 % overall savings on
compress_promptacross 22 shapes
(§ 2) - 98.0 % savings on re-reads via
read_file_delta - 94.4 % savings on targeted edits via
write_file_delta - 95.4 % savings on Claude-Code-sized system prompts
2. Determinism + speed first
- p99 ≤ 87 ms on 5 of 7 ops:
count_tokens,compress_prompt,compress_output,read_file_delta,navigate_codebase - 100 % ok_rate across 190 bench runs (zero crashes, zero
malformed payloads) - Zero ML at query time on the default build. Haiku is shell-out
only, opt-in per feature flag.
3. Honest about what it isn't
- LOCOMO conversational recall plateaus around 40 % on V032
full stack. mem0 / HippoRAG hit 80-90 % with neural embeddings at
query time — we chose determinism + speed instead. - Adversarial questions: V032 loses some ground (HyDE surfaces
tangential chunks the LLM then hallucinates over). V030 default
stays at 83 % on adversarial, V032 drops to 67 %. - Per-type, not global. Our +17 pt gains on multi-hop /
single-hop / temporal are directionally real at N=30 but CIs
overlap — we flag that explicitly in
§ 5.1.
What's in the binary
11 MCP tools, all stdio:
| Tool | What it does |
|---|---|
compress_prompt |
Keep query-relevant sections of a long prompt |
compress_history |
Summary + facts + recent + optional retrieval over the conversation |
compress_output |
Strip noise from command stdout/stderr (20+ domain filters) |
navigate_codebase |
tree-sitter / regex digest of a repo, PageRanked by query |
update_memory |
Append messages to the session store (JSONL persist + graph ingest) |
read_file_delta |
Version/hash-aware file read, unchanged → minimal payload |
write_file_delta |
Send edits as diffs, not full files |
encode_fragments |
Detect repeated boilerplate, replace with tokens |
decode_fragments |
Reverse the encoding |
count_tokens |
cl100k_base-accurate token count |
get_token_stats |
Session-level savings rollup |
Binary sizes:
- 7.2 MB default (regex extractors, HashEmbedder)
- 25 MB with tree-sitter (11 languages: Rust, Python, JS, TS, TSX, Go, Ruby, Java, C/C++, PHP, Kotlin, Swift)
- 34 MB with BGE-small semantic embedder
- 42 MB with all features
Feature flags
Run sophon doctor to see every SOPHON_* env var currently set,
with validation warnings and a note for deprecated recall-chasing
flags. The full catalogue (24 flags, grouped by scope) lives inruntime_flags.rs.
On-thesis, still recommended:
| Flag | What it adds | Cost |
|---|---|---|
SOPHON_RETRIEVER_PATH=/dir |
Activate the semantic retriever (chunk store on disk). | ~0 |
SOPHON_MEMORY_PATH=/file.jsonl |
Persistent conversation memory across sophon serve runs. |
~0 |
SOPHON_HYBRID=1 |
BM25 sparse-lexical + HashEmbedder fused via RRF. | ~1 ms |
SOPHON_CHUNK_TARGET=500 |
Bigger chunks preserve cross-sentence context. | ~0 |
SOPHON_EMBEDDER=bge |
Swap HashEmbedder for BGE-small (needs --features bge). |
+model load at startup |
SOPHON_NO_LLM_SUMMARY=1 |
Opt-out from Haiku summary; heuristic only. | Speed (bench utility) |
SOPHON_DEBUG_LLM=1 |
Richer tracing warnings for LLM subprocess failures. | — |
Deprecated (v0.4.0 recall-chasing experiments, scheduled for removal):
SOPHON_HYDE, SOPHON_FACT_CARDS, SOPHON_ENTITY_GRAPH,SOPHON_ADAPTIVE, SOPHON_LLM_RERANK, SOPHON_TAIL_SUMMARY,SOPHON_REACT, SOPHON_GRAPH_MEMORY, SOPHON_MULTIHOP_LLM —
these chase LOCOMO recall, an axis we no longer optimise. Still
functional but sophon doctor flags them. See
CHANGELOG.md § 0.5.0 Positioning re-scope.
If you need neural recall, pipe mem0 / Letta in front of Sophon
instead (see When to use
below).
When to use it — Sophon in front of X
Sophon is not a memory platform, a recall system, an OCR stack,
or a replacement for provider-side caching. It's a deterministic
context compressor that slots in front of whatever memory /
cache / code-nav layer you already use, and attacks the tokens those
layers can't.
The v0.5.0 positioning is explicit: Sophon stops chasing LOCOMO
recall (mem0's territory) and doubles down on pure compression —
tokens saved %, latency p99, binary size, canary preservation, MCP
compliance. See CHANGELOG.md for the re-scope
note.
Sophon in front of Anthropic / OpenAI prompt caching
Provider caching handles the static half of a request — system
prompt, tool definitions, reused documents. It doesn't touch the
dynamic half (growing conversation history, tool outputs). Sophon
compresses exactly that half. The two stack cleanly.
Reproducible measurement:
benchmarks/sophon_plus_prompt_caching.py
simulates a 25-turn agent session with a 6600-token cacheable
static block and claude-3.5-sonnet pricing. Sophon saves an
additional 23.8 % tokens / ~49 % $ on top of prompt caching —
because the uncached dynamic block is billed at 10× the cached
rate, so every dynamic-token Sophon removes is worth ~10 cached
tokens in dollars.
Sophon in front of mem0 / Letta / Zep / Graphiti
mem0 and friends retrieve the right memories. Sophon shrinks what
gets sent to the LLM after retrieval. If mem0 returns 2 kB of
raw memories, compress_prompt keeps only the sections the query
actually references.
Reproducible measurement:
benchmarks/sophon_plus_mem0.py
runs against a surrogate mem0 retriever by default
(no API keys needed) or the realmem0aipackage with--real-mem0. It reports Sophon's additional savings + the
proper-noun / date / number preservation rate. Honest caveat
built-in: on very short mem0 outputs (< ~200 tokens) Sophon adds
overhead from its own wrapper — only pipe larger dumps through it.
Sophon in front of Claude Code / Cursor / Cline
This is the primary use case. Every repeat file read becomes aread_file_delta; every shell command output goes throughcompress_output; every repeated boilerplate block gets swapped for
a fragment_cache token. A 25-turn session drops from ~15 k
tokens/turn to ~9 k tokens/turn.
Reproducible measurement:
benchmarks/session_token_economics.py
— 68.1 % session tokens saved
(§ 1).
Install withsophon hook install --agent claude --global.
Sophon in front of a RAG pipeline
navigate_codebase produces a PageRanked repo digest that a RAG
retriever would otherwise spend expensive embedding calls to build.
Sophon emits it deterministically, with tree-sitter / regex symbol
extraction over 11 languages, in under a second.
When NOT to pipe Sophon in front of something
- Long-form conversational recall above 80 % — Sophon caps at
~40 % LOCOMO and we don't chase it. Run
mem0 /
Letta /
Zep for recall, then optionally
pipe their output through Sophon (see above). - Multi-hop reasoning on massive documents — that's
HippoRAG or
GraphRAG's job. - OCR / PDF layout analysis — out of scope. Use
Docling, Marker, or
Unstructured upstream of Sophon. - Very small inputs (< ~200 tokens) — Sophon's XML-tagged
section scaffolding can cost more than it saves. Pass through raw.
Install
Via npm (wraps the native binary)
npm install -g mcp-sophon
sophon --version
The postinstall script downloads the right prebuilt binary for your
platform from the GitHub Releases page. Supported: macOS arm64/x64,
Linux arm64/x64, Windows x64.
Prebuilt binary
Grab the archive for your platform from the
Releases page
and put sophon on your PATH.
Build from source
git clone https://github.com/lacausecrypto/mcp-sophon
cd mcp-sophon/sophon
cargo build --release -p mcp-integration
# default build at target/release/sophon (~7.2 MB, regex extractors only)
# opt into 11-language AST extraction (~25 MB):
cargo build --release -p mcp-integration --features codebase-navigator/tree-sitter
# opt into BGE-small semantic embedder (~34 MB):
cargo build --release -p mcp-integration --features bge
# activate at runtime: SOPHON_EMBEDDER=bge SOPHON_RETRIEVER_PATH=~/.sophon/retriever
# all features (~42 MB):
cargo build --release -p mcp-integration --features "codebase-navigator/tree-sitter,bge"
Requires Rust 1.75+.
Quick start
As an MCP server
{
"mcpServers": {
"sophon": {
"command": "sophon",
"args": ["serve"]
}
}
}
CLI
sophon compress-prompt --prompt ./system.txt --query "how do I handle errors in Rust" --max-tokens 500
sophon compress-history --input ./history.json
sophon stats --period session
sophon serve # MCP stdio server
# Output compression + CLI hooks
sophon exec -- git status # run + compress output
sophon exec -- cargo test # failures only, ~90 % smaller
sophon compress-output --cmd "git diff" --input diff.txt
# Transparent hook installation for Claude Code
sophon hook install --agent claude --global
sophon hook status # show the 20 rewrite rules
sophon hook uninstall --agent claude --global
Programmatic (one-shot JSON-RPC)
echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"compress_prompt","arguments":{"prompt":"<rust>use Result and the ? operator</rust><web>fetch()</web>","query":"rust errors","max_tokens":500}}}' \
| sophon serve
Typical v0.5.0 setup
# Default: zero-ML compression with BM25+Hash hybrid retrieval on
# (on-thesis, deterministic, sub-ms overhead).
export SOPHON_RETRIEVER_PATH=~/.sophon/retriever
export SOPHON_HYBRID=1
export SOPHON_MEMORY_PATH=~/.sophon/memory.jsonl
sophon serve
# Diagnose your install before wiring it into an MCP client
sophon doctor
The v0.4.0 recall-chasing flags (SOPHON_HYDE,SOPHON_FACT_CARDS, SOPHON_ENTITY_GRAPH, SOPHON_GRAPH_MEMORY,
…) still parse but sophon doctor flags them as deprecated — see
CHANGELOG § 0.5.0.
Workspace layout
.
├── README.md ← you are here
├── BENCHMARK.md ← current v0.4.0 numbers, per-section
├── CHANGELOG.md ← version history + corrections + honest findings
├── LICENSE ← MIT
├── server.json ← MCP registry manifest
├── .github/workflows/ ← CI + release automation
├── benchmarks/ ← reproducible scripts for every number
├── npm/ ← npm wrapper package
└── sophon/ ← Rust workspace (11 crates)
├── Cargo.toml
├── sophon.toml ← default runtime config
└── crates/
├── sophon-core/ shared types, token/hash helpers
├── prompt-compressor/ compress_prompt
├── memory-manager/ compress_history, update_memory, graph memory (v0.4.0)
├── delta-streamer/ read_file_delta, write_file_delta
├── fragment-cache/ encode_fragments, decode_fragments
├── semantic-retriever/ chunker + HashEmbedder + BM25 + entity graph (v0.4.0)
├── sophon-storage/ SQLite persistence (WAL, embeddings cache)
├── output-compressor/ command-aware stdout/stderr compression
├── cli-hooks/ transparent command rewriter + agent installer
├── codebase-navigator/ tree-sitter/regex + PageRank + digest
└── mcp-integration/ stdio server, tool schemas, CLI
Configuration
Runtime defaults live in sophon/sophon.toml.
See the full feature flag table above for
env-var-gated features. Baseline env vars:
SOPHON_MEMORY_PATH— JSONL persistence for session memorySOPHON_RETRIEVER_PATH— directory for the semantic retriever store
(enables thequeryparameter oncompress_history)SOPHON_EMBEDDER—hash(default) orbge(needs--features bgebuild)SOPHON_FRAGMENT_MAX_WINDOW— override the fragment detector windowSOPHON_CONFIG— path to asophon.tomlconfig file
Per-call overrides are available on every MCP tool argument set
(max_tokens, recent_window, include_index, …).
Honest limitations
The full list is in BENCHMARK.md § 8.
Headlines:
- LOCOMO caps at ~40 %. Mem0 / HippoRAG sit at 80-90 % with
neural retrieval — we don't match that. We chose determinism. - Multi-hop is hard. V032 brings 0 → 17 % on LOCOMO multi-hop
stratified. FULL ceiling is 83 %. The gap is structural. - V032 latency is heavy. ~42 s p50 on long conversations when
the full flag stack is on. Pick features a la carte. - HashEmbedder is keyword-bound. "favorite food" ↔ "weakness
for ginger snaps" doesn't match without HyDE. - No multimodal ingestion. Images / PDFs / audio are out of
scope — run Docling / Marker / Unstructured upstream.
Contributing
See CONTRIBUTING.md. PRs especially welcome for:
- TypeScript bindings (Python bindings ship in
sophon-py/) - TOML-based extractor plugins for new languages (see
crates/codebase-navigator/plugins/haskell.tomlfor the format) - More grammars for
navigate_codebase - Running the real
mem0library on LOCOMO to replace themem0-litesurrogate in § 6.2 - Multi-seed LOCOMO re-runs to tighten the V032 CI
Run the full test suite with:
cd sophon && cargo test --workspace # 303 tests
cd sophon && cargo test --features codebase-navigator/tree-sitter # +15 AST tests
cd sophon && cargo test -p semantic-retriever --features bge -- --ignored # 5 BGE tests (needs model)
cd sophon-py && .venv/bin/pytest tests/ # 4 Python tests
Every benchmark claim above is reproducible — pointers to the
scripts live in BENCHMARK.md. Open an issue if any
number doesn't reproduce on your machine.
License
MIT. See LICENSE.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found