mcp-sophon

mcp
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Uyari
  • fs module — File system access in .github/workflows/ci.yml
  • fs module — File system access in .github/workflows/mcp-registry.yml
  • fs module — File system access in .github/workflows/release.yml
Permissions Gecti
  • Permissions — No dangerous permissions requested
Purpose

This is a deterministic context compressor for MCP agents. It compresses prompts, memory, and code digests locally to save tokens and money when using AI models, operating without machine learning models or API keys at query time.

Security Assessment

Overall risk: Low. The tool does not request dangerous permissions and no hardcoded secrets were detected. The file system access warnings are strictly limited to automated GitHub Actions workflows (CI, release, and registry deployment), which is standard and safe behavior. Because it is a standalone Rust binary designed to run locally without a GPU or API keys, it inherently minimizes data exposure. It does not appear to execute unexpected shell commands or make unauthorized network requests during standard operation.

Quality Assessment

The project is highly maintained, with its last push occurring today. It is transparent about its performance metrics, linking to reproducible benchmark scripts, and is backed by a permissive MIT license. It boasts a solid test suite (387 Rust + 4 Python tests). However, community visibility and trust are currently very low. With only 5 GitHub stars, the tool has not yet been broadly vetted by the open-source community, meaning users are relying primarily on the developer's own testing and claims.

Verdict

Use with caution — the tool appears secure, efficient, and well-engineered with a safe CI setup, but its extremely low community adoption means it has not yet been widely battle-tested.
SUMMARY

Deterministic context compressor for MCP agents. Slots in front of prompt caching, mem0, Letta, or Claude Code. Single Rust binary, zero ML at query time. +24% tokens / +49% $ saved on top of Anthropic prompt caching.

README.md

Sophon

Honest token economics for MCP agents. One Rust binary. Zero ML at query time. Reproducible benchmarks.

npm version
npm total downloads
GitHub release
CI
License: MIT
Rust 1.75+
MCP
Tests

Sophon is a deterministic context layer for agents speaking the Model
Context Protocol. It compresses prompts, conversation memory, code
digests, file deltas, and shell output — without an embedding model at
query time, without a GPU, and without API keys. 7.2 MB default
Rust binary (25 MB with the optional 11-language tree-sitter AST
backend, 34 MB with BGE embedder), MCP-native, cl100k_base-accurate.

Every number below links to the reproducible benchmark script that
produced it. Every caveat is in BENCHMARK.md. Version
history + deprecated numbers live in CHANGELOG.md.


TL;DR — v0.5.0

Sophon is a deterministic context compressor that slots in front
of whatever memory / cache / code-nav layer you already use — not
instead of them. v0.5.0 is a positioning re-scope: we stopped chasing
LOCOMO conversational recall (mem0's territory) and doubled down on
pure compression. Full rationale in
CHANGELOG § 0.5.0.

New in v0.5.0 — orthogonal-stack economics

Stack Additional saved by Sophon Benchmark
Sophon + Anthropic prompt caching +24 % tokens / +49 % $ on a 25-turn Claude-3.5-Sonnet session sophon_plus_prompt_caching.py
Sophon + mem0 Depends on mem0 output size; the bench flags overhead on short dumps directly sophon_plus_mem0.py

New in v0.5.0 — single-binary efficiency

Four lines every Python-based context layer would struggle to match. All measured against the v0.5.0 release binary on macOS arm64.

Metric Value Benchmark
Binary on disk 8.7 MB (release) stat on the release target
Cold start → ready 10.6 ms p50, 25 ms p99 cold_start_and_footprint.py
RSS after initialize 12.5 MB idem
Session scaling (1 → 200 turns) update_memory 0.1 ms p50, flat; compress_history 4.2 ms p50 / 50 ms p99 session_scaling_curve.py
compress_output coverage 81.6 % weighted aggregate across 15 command families (git, cargo, docker, pytest, npm, kubectl, curl, tail, grep, …) compress_output_per_command.py

Pass --include-python-baseline to cold_start_and_footprint.py to contrast against python -c "import mem0" / sentence_transformers / langchain on your machine.

Carried over (still on-thesis, measured at v0.4.0 and unchanged)

Use case Metric Compared to
Agent session token economics 68.1 % tokens saved across 25-turn coding session (§ 1) Baseline: raw tokens
Prompt compression 70.2 % mean saved, 36 ms mean latency, 22 prompt shapes (§ 2) LLMLingua-2: +8.9 pt at 35× lower latency (§ 6.1)
Code retrieval (repo QA) recall@3 = 70 % on "where is X?" questions (§ 4) grep: 10 % ; FULL context: 20 %
Latency + reliability p99 < 87 ms on 5/7 ops, 100 % ok_rate on 190 runs (§ 3) Sub-second guaranteed

Protocol + DX changes in v0.5.0

  • MCP protocol 2025-06-18 — adds notifications/cancelled,
    structured JSON-RPC error codes (-32000..-32099 reserved range
    for Sophon server errors), and an infallible dispatcher so a
    single malformed request can no longer kill the stdio loop.
  • sophon doctor — read-only installation diagnostic: binary
    • resolved config + every SOPHON_* flag in use + path
      writability + LLM-command PATH probe + MCP-client config
      hints. Also surfaces deprecated recall-chasing flags.
  • Observability — 18 eprintln! replaced with tracing;
    filter via RUST_LOG=sophon=debug.
  • Tests — workspace count 303 (v0.4.0) → 405+.

What stopped being a goal

Long-form conversational recall above ~40 % on LOCOMO is now
explicitly out of scope. mem0 hits 91 % with neural retrieval; we
sit in front of mem0 instead. The v0.4.0 recall-chasing flags
(SOPHON_HYDE, SOPHON_FACT_CARDS, SOPHON_ENTITY_GRAPH,
SOPHON_LLM_RERANK, SOPHON_ADAPTIVE, SOPHON_TAIL_SUMMARY,
SOPHON_REACT, SOPHON_GRAPH_MEMORY, SOPHON_MULTIHOP_LLM) stay
functional but are flagged by sophon doctor as deprecated and
will be removed.


The three pillars

1. Measured economies, not promised ones

  • 68.1 % session tokens saved over a 25-turn coding session
    (§ 1)
  • 70.2 % overall savings on compress_prompt across 22 shapes
    (§ 2)
  • 98.0 % savings on re-reads via read_file_delta
  • 94.4 % savings on targeted edits via write_file_delta
  • 95.4 % savings on Claude-Code-sized system prompts

2. Determinism + speed first

  • p99 ≤ 87 ms on 5 of 7 ops: count_tokens, compress_prompt,
    compress_output, read_file_delta, navigate_codebase
  • 100 % ok_rate across 190 bench runs (zero crashes, zero
    malformed payloads)
  • Zero ML at query time on the default build. Haiku is shell-out
    only, opt-in per feature flag.

3. Honest about what it isn't

  • LOCOMO conversational recall plateaus around 40 % on V032
    full stack. mem0 / HippoRAG hit 80-90 % with neural embeddings at
    query time — we chose determinism + speed instead.
  • Adversarial questions: V032 loses some ground (HyDE surfaces
    tangential chunks the LLM then hallucinates over). V030 default
    stays at 83 % on adversarial, V032 drops to 67 %.
  • Per-type, not global. Our +17 pt gains on multi-hop /
    single-hop / temporal are directionally real at N=30 but CIs
    overlap — we flag that explicitly in
    § 5.1.

What's in the binary

11 MCP tools, all stdio:

Tool What it does
compress_prompt Keep query-relevant sections of a long prompt
compress_history Summary + facts + recent + optional retrieval over the conversation
compress_output Strip noise from command stdout/stderr (20+ domain filters)
navigate_codebase tree-sitter / regex digest of a repo, PageRanked by query
update_memory Append messages to the session store (JSONL persist + graph ingest)
read_file_delta Version/hash-aware file read, unchanged → minimal payload
write_file_delta Send edits as diffs, not full files
encode_fragments Detect repeated boilerplate, replace with tokens
decode_fragments Reverse the encoding
count_tokens cl100k_base-accurate token count
get_token_stats Session-level savings rollup

Binary sizes:

  • 7.2 MB default (regex extractors, HashEmbedder)
  • 25 MB with tree-sitter (11 languages: Rust, Python, JS, TS, TSX, Go, Ruby, Java, C/C++, PHP, Kotlin, Swift)
  • 34 MB with BGE-small semantic embedder
  • 42 MB with all features

Feature flags

Run sophon doctor to see every SOPHON_* env var currently set,
with validation warnings and a note for deprecated recall-chasing
flags. The full catalogue (24 flags, grouped by scope) lives in
runtime_flags.rs.

On-thesis, still recommended:

Flag What it adds Cost
SOPHON_RETRIEVER_PATH=/dir Activate the semantic retriever (chunk store on disk). ~0
SOPHON_MEMORY_PATH=/file.jsonl Persistent conversation memory across sophon serve runs. ~0
SOPHON_HYBRID=1 BM25 sparse-lexical + HashEmbedder fused via RRF. ~1 ms
SOPHON_CHUNK_TARGET=500 Bigger chunks preserve cross-sentence context. ~0
SOPHON_EMBEDDER=bge Swap HashEmbedder for BGE-small (needs --features bge). +model load at startup
SOPHON_NO_LLM_SUMMARY=1 Opt-out from Haiku summary; heuristic only. Speed (bench utility)
SOPHON_DEBUG_LLM=1 Richer tracing warnings for LLM subprocess failures.

Deprecated (v0.4.0 recall-chasing experiments, scheduled for removal):

SOPHON_HYDE, SOPHON_FACT_CARDS, SOPHON_ENTITY_GRAPH,
SOPHON_ADAPTIVE, SOPHON_LLM_RERANK, SOPHON_TAIL_SUMMARY,
SOPHON_REACT, SOPHON_GRAPH_MEMORY, SOPHON_MULTIHOP_LLM
these chase LOCOMO recall, an axis we no longer optimise. Still
functional but sophon doctor flags them. See
CHANGELOG.md § 0.5.0 Positioning re-scope.
If you need neural recall, pipe mem0 / Letta in front of Sophon
instead (see When to use
below).


When to use it — Sophon in front of X

Sophon is not a memory platform, a recall system, an OCR stack,
or a replacement for provider-side caching. It's a deterministic
context compressor that slots in front of whatever memory /
cache / code-nav layer you already use, and attacks the tokens those
layers can't.

The v0.5.0 positioning is explicit: Sophon stops chasing LOCOMO
recall (mem0's territory) and doubles down on pure compression —
tokens saved %, latency p99, binary size, canary preservation, MCP
compliance. See CHANGELOG.md for the re-scope
note.

Sophon in front of Anthropic / OpenAI prompt caching

Provider caching handles the static half of a request — system
prompt, tool definitions, reused documents. It doesn't touch the
dynamic half (growing conversation history, tool outputs). Sophon
compresses exactly that half. The two stack cleanly.

Reproducible measurement:
benchmarks/sophon_plus_prompt_caching.py
simulates a 25-turn agent session with a 6600-token cacheable
static block and claude-3.5-sonnet pricing. Sophon saves an
additional 23.8 % tokens / ~49 % $ on top of prompt caching —
because the uncached dynamic block is billed at 10× the cached
rate, so every dynamic-token Sophon removes is worth ~10 cached
tokens in dollars.

Sophon in front of mem0 / Letta / Zep / Graphiti

mem0 and friends retrieve the right memories. Sophon shrinks what
gets sent to the LLM after retrieval. If mem0 returns 2 kB of
raw memories, compress_prompt keeps only the sections the query
actually references.

Reproducible measurement:
benchmarks/sophon_plus_mem0.py
runs against a surrogate mem0 retriever by default
(no API keys needed) or the real mem0ai package with
--real-mem0. It reports Sophon's additional savings + the
proper-noun / date / number preservation rate. Honest caveat
built-in: on very short mem0 outputs (< ~200 tokens) Sophon adds
overhead from its own wrapper — only pipe larger dumps through it.

Sophon in front of Claude Code / Cursor / Cline

This is the primary use case. Every repeat file read becomes a
read_file_delta; every shell command output goes through
compress_output; every repeated boilerplate block gets swapped for
a fragment_cache token. A 25-turn session drops from ~15 k
tokens/turn to ~9 k tokens/turn.

Reproducible measurement:
benchmarks/session_token_economics.py
68.1 % session tokens saved
(§ 1).
Install with sophon hook install --agent claude --global.

Sophon in front of a RAG pipeline

navigate_codebase produces a PageRanked repo digest that a RAG
retriever would otherwise spend expensive embedding calls to build.
Sophon emits it deterministically, with tree-sitter / regex symbol
extraction over 11 languages, in under a second.


When NOT to pipe Sophon in front of something

  • Long-form conversational recall above 80 % — Sophon caps at
    ~40 % LOCOMO and we don't chase it. Run
    mem0 /
    Letta /
    Zep for recall, then optionally
    pipe their output through Sophon (see above).
  • Multi-hop reasoning on massive documents — that's
    HippoRAG or
    GraphRAG's job.
  • OCR / PDF layout analysis — out of scope. Use
    Docling, Marker, or
    Unstructured upstream of Sophon.
  • Very small inputs (< ~200 tokens) — Sophon's XML-tagged
    section scaffolding can cost more than it saves. Pass through raw.

Install

Via npm (wraps the native binary)

npm install -g mcp-sophon
sophon --version

The postinstall script downloads the right prebuilt binary for your
platform from the GitHub Releases page. Supported: macOS arm64/x64,
Linux arm64/x64, Windows x64.

Prebuilt binary

Grab the archive for your platform from the
Releases page
and put sophon on your PATH.

Build from source

git clone https://github.com/lacausecrypto/mcp-sophon
cd mcp-sophon/sophon
cargo build --release -p mcp-integration
# default build at target/release/sophon (~7.2 MB, regex extractors only)

# opt into 11-language AST extraction (~25 MB):
cargo build --release -p mcp-integration --features codebase-navigator/tree-sitter

# opt into BGE-small semantic embedder (~34 MB):
cargo build --release -p mcp-integration --features bge
# activate at runtime: SOPHON_EMBEDDER=bge SOPHON_RETRIEVER_PATH=~/.sophon/retriever

# all features (~42 MB):
cargo build --release -p mcp-integration --features "codebase-navigator/tree-sitter,bge"

Requires Rust 1.75+.


Quick start

As an MCP server

{
  "mcpServers": {
    "sophon": {
      "command": "sophon",
      "args": ["serve"]
    }
  }
}

CLI

sophon compress-prompt --prompt ./system.txt --query "how do I handle errors in Rust" --max-tokens 500
sophon compress-history --input ./history.json
sophon stats --period session
sophon serve                                    # MCP stdio server

# Output compression + CLI hooks
sophon exec -- git status                       # run + compress output
sophon exec -- cargo test                       # failures only, ~90 % smaller
sophon compress-output --cmd "git diff" --input diff.txt

# Transparent hook installation for Claude Code
sophon hook install --agent claude --global
sophon hook status                              # show the 20 rewrite rules
sophon hook uninstall --agent claude --global

Programmatic (one-shot JSON-RPC)

echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"compress_prompt","arguments":{"prompt":"<rust>use Result and the ? operator</rust><web>fetch()</web>","query":"rust errors","max_tokens":500}}}' \
  | sophon serve

Typical v0.5.0 setup

# Default: zero-ML compression with BM25+Hash hybrid retrieval on
# (on-thesis, deterministic, sub-ms overhead).
export SOPHON_RETRIEVER_PATH=~/.sophon/retriever
export SOPHON_HYBRID=1
export SOPHON_MEMORY_PATH=~/.sophon/memory.jsonl
sophon serve

# Diagnose your install before wiring it into an MCP client
sophon doctor

The v0.4.0 recall-chasing flags (SOPHON_HYDE,
SOPHON_FACT_CARDS, SOPHON_ENTITY_GRAPH, SOPHON_GRAPH_MEMORY,
…) still parse but sophon doctor flags them as deprecated — see
CHANGELOG § 0.5.0.


Workspace layout

.
├── README.md           ← you are here
├── BENCHMARK.md        ← current v0.4.0 numbers, per-section
├── CHANGELOG.md        ← version history + corrections + honest findings
├── LICENSE             ← MIT
├── server.json         ← MCP registry manifest
├── .github/workflows/  ← CI + release automation
├── benchmarks/         ← reproducible scripts for every number
├── npm/                ← npm wrapper package
└── sophon/             ← Rust workspace (11 crates)
    ├── Cargo.toml
    ├── sophon.toml     ← default runtime config
    └── crates/
        ├── sophon-core/          shared types, token/hash helpers
        ├── prompt-compressor/    compress_prompt
        ├── memory-manager/       compress_history, update_memory, graph memory (v0.4.0)
        ├── delta-streamer/       read_file_delta, write_file_delta
        ├── fragment-cache/       encode_fragments, decode_fragments
        ├── semantic-retriever/   chunker + HashEmbedder + BM25 + entity graph (v0.4.0)
        ├── sophon-storage/       SQLite persistence (WAL, embeddings cache)
        ├── output-compressor/    command-aware stdout/stderr compression
        ├── cli-hooks/            transparent command rewriter + agent installer
        ├── codebase-navigator/   tree-sitter/regex + PageRank + digest
        └── mcp-integration/      stdio server, tool schemas, CLI

Configuration

Runtime defaults live in sophon/sophon.toml.
See the full feature flag table above for
env-var-gated features. Baseline env vars:

  • SOPHON_MEMORY_PATH — JSONL persistence for session memory
  • SOPHON_RETRIEVER_PATH — directory for the semantic retriever store
    (enables the query parameter on compress_history)
  • SOPHON_EMBEDDERhash (default) or bge (needs --features bge build)
  • SOPHON_FRAGMENT_MAX_WINDOW — override the fragment detector window
  • SOPHON_CONFIG — path to a sophon.toml config file

Per-call overrides are available on every MCP tool argument set
(max_tokens, recent_window, include_index, …).


Honest limitations

The full list is in BENCHMARK.md § 8.
Headlines:

  1. LOCOMO caps at ~40 %. Mem0 / HippoRAG sit at 80-90 % with
    neural retrieval — we don't match that. We chose determinism.
  2. Multi-hop is hard. V032 brings 0 → 17 % on LOCOMO multi-hop
    stratified. FULL ceiling is 83 %. The gap is structural.
  3. V032 latency is heavy. ~42 s p50 on long conversations when
    the full flag stack is on. Pick features a la carte.
  4. HashEmbedder is keyword-bound. "favorite food" ↔ "weakness
    for ginger snaps" doesn't match without HyDE.
  5. No multimodal ingestion. Images / PDFs / audio are out of
    scope — run Docling / Marker / Unstructured upstream.

Contributing

See CONTRIBUTING.md. PRs especially welcome for:

  • TypeScript bindings (Python bindings ship in sophon-py/)
  • TOML-based extractor plugins for new languages (see
    crates/codebase-navigator/plugins/haskell.toml for the format)
  • More grammars for navigate_codebase
  • Running the real mem0 library on LOCOMO to replace the
    mem0-lite surrogate in § 6.2
  • Multi-seed LOCOMO re-runs to tighten the V032 CI

Run the full test suite with:

cd sophon && cargo test --workspace                                  # 303 tests
cd sophon && cargo test --features codebase-navigator/tree-sitter    # +15 AST tests
cd sophon && cargo test -p semantic-retriever --features bge -- --ignored  # 5 BGE tests (needs model)
cd sophon-py && .venv/bin/pytest tests/                              # 4 Python tests

Every benchmark claim above is reproducible — pointers to the
scripts live in BENCHMARK.md. Open an issue if any
number doesn't reproduce on your machine.


License

MIT. See LICENSE.

Yorumlar (0)

Sonuc bulunamadi