basemind

mcp
Security Audit
Pass
Health Pass
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 16 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

Full AI context layer for coding agents — code-map, document RAG, shared memory, web crawl, git history. 300+ languages, one MCP server.

README.md

basemind

Full AI context layer for coding agents — code-map, document RAG, shared memory, web crawl,
git history. 300+ languages, one MCP server.

License: MIT
crates.io
npm
PyPI

CI

statusline


The four pillars

Code — Tree-sitter outlines, symbol search, reference + caller + implementation graphs,
call chains, git history per symbol, blame at symbol-level resolution.

Documents — Ingest + semantic search over PDFs, Office (Word/Excel/iWork), HTML, email,
archives. Built-in OCR, layout detection, keyword + NER extraction, cross-encoder reranking.
All ONNX bundled — no system install needed.

Memory — Per-repo scoped key-value + semantic vector storage. Clones of the same git
origin automatically share memory; unrelated repos isolated.

Web — On-demand HTTP scrape + follow-link crawl. Pages chunk, embed, and land in the
documents store under scope web:<host> for unified search.


Feature table

Pillar What it does MCP tools Backend
Code intelligence Outlines, symbol search, refs/callers/callees, call graphs, impl lookup, dependents, in-tree regex outline, search_symbols, workspace_grep, find_references, find_callers, call_graph, find_implementations, dependents, list_files, status, repo_info tree-sitter × 300+ langs · Fjall LSM index · content-addressed blob store
Git intelligence Symbol-level history, blame, churn, recent changes, structural diffs across revs symbol_history, blame_file, blame_symbol, hot_files, recent_changes, commits_touching, find_commits_by_path, diff_outline, diff_file, working_tree_status gix + sha-keyed disk cache
Document RAG Ingest + semantic search over PDFs, Office (Excel/Word/HWP/iWork), HTML, XML, email, archives. Adds OCR (Tesseract + PaddleOCR), cross-encoder reranker, keyword extraction (YAKE/RAKE), NER (gline-rs ONNX + LLM), extractive + abstractive summarization, layout detection, page auto-rotate, redaction, language detection. All ONNX models bundled — no system install needed. search_documents kreuzberg + LanceDB
Shared memory Per-repo scoped key-value + semantic memory. Clones of the same git origin URL automatically share memory; unrelated repos isolated. memory_put, memory_get, memory_list, memory_search, memory_delete LanceDB + Fjall, scope-keyed
Web crawl On-demand HTTP scrape + link-following crawl. Crawled pages route through the documents pipeline (chunk → embed → LanceDB) under scope web:<host>. web_scrape, web_crawl, web_map kreuzcrawl (native HTTP, no chromium)
Admin Live rescan + telemetry dashboard rescan, telemetry_summary

Quickstart

Claude Code

/plugin marketplace add Goldziher/basemind
/plugin install basemind@basemind

Restart the session. Optional: add a live statusline to ~/.claude/settings.json:

{
  "statusLine": {
    "type": "command",
    "command": "$HOME/.claude/plugins/basemind/.claude-plugin/statusline.sh",
    "refreshInterval": 5
  }
}

Output: ▲ basemind 144 files · scanned 2d ago ● 0 calls · 0 tok saved. The freshness dot is
green (< 1 h), yellow (1–24 h), or red (> 1 day).

Any MCP client

cargo install basemind --features full --locked

Then add to your MCP config:

{
  "mcpServers": {
    "basemind": {
      "command": "basemind",
      "args": ["serve"]
    }
  }
}

Supported harnesses: Claude Code · Cursor · Codex (CLI + App) · Gemini · OpenCode · Factory Droid ·
GitHub Copilot CLI · Continue · Cline. Each harness has install instructions in the
Harness-specific setup section below.

CLI only

basemind scan                     # index the working tree
basemind query outline path/file.rs  # inspect structure
basemind query symbol "parseQuery"   # find by name
basemind watch                    # live re-index on file change

Why basemind, specifically

vs grep / ripgrep

What ripgrep does well: blazing-fast line matching. What it misses:

  • Grep returns 50+ hits in docs, tests, comments, variable names — agent wastes context filtering noise.
  • No scope awareness: parseQuery() and parseQuery string both match; semantic signals lost.
  • Every query re-scans the disk; no pre-computed structures to leverage.

basemind: semantic-quality answers at grep speed via tree-sitter + indexed call sites.

vs vector-only RAG (LangChain / LlamaIndex DIY stacks)

What vector RAG does well: fuzzy document semantic search. What it misses:

  • Pure embeddings lose exact structure — which function calls which, which class implements which interface.
  • No line/column resolution — agent can't map vector hits back to code symbols.
  • No git history integration — "what changed recently?" and "who wrote this?" require separate systems.

basemind: code structure + git history + vector memory + document search all in one, unified scope.

vs context7 / openai-codex / Aider's repo-map

What these do well: generate code-map summaries. What they miss:

  • Static snapshots — stale after the first edit.
  • No semantic indexing — every lookup re-parses or re-scans.
  • Human-focused output (markdown) instead of agent-facing structure (JSON tools).

basemind: live-updated index with sub-millisecond MCP tools, built for agents not humans.

vs GitHub native search

What GitHub does well: repository-wide fuzzy text search. What it misses:

  • Cloud-only — your code leaves the machine, latency is network-bound.
  • No local-editor integration — agent can't query in-progress edits before commit.
  • No cross-language polyglot support — each language's search tuned separately.

basemind: local-only, always-fresh index of your working tree, 300+ languages in one sweep.


Performance

Measured on Apple Silicon, release build, --features full, default eager_l2 = true. Cold
filesystem cache adds ~50% to first scan; numbers below are warm steady-state.

Scan throughput

Repo Files Language mix Time
tokio 859 Rust 0.2 s
react 7 061 TS / JSX 2.2 s
django 7 061 Python 2.5 s
requests 2 195 Python 0.7 s
gin 1 217 Go 1.0 s
ripgrep 12 851 Rust 4.0 s
ripgrep-shallow 12 851 Rust 0.16 s
TypeScript compiler 81 324 TS / JS / JSON ~22 s

The TypeScript compiler is the worst case — 81k files scanned in 22 seconds. Most real repos sit
between tokio and ripgrep. Re-scans skip unchanged content hashes, so warm rescans on edited
working trees are typically dominated by the changed-set size, not repo size.

Per-tool MCP latency

Against the 81k-file TypeScript index:

Latency Tools
< 1 ms outline, list_files, find_references, find_callers, find_implementations, hot_files, repo_info
3–6 ms search_symbols, call_graph
4–10 ms recent_changes, commits_touching, find_commits_by_path, symbol_history, diff_outline, diff_file
20–25 ms status
30–40 ms blame_file, blame_symbol
40–200 ms workspace_grep
~200 ms search_documents
350–600 ms working_tree_status

basemind preloads L1 outlines into RAM on serve start, so code-map queries hit no disk. The Fjall
LSM inverted index handles ref/caller/impl lookups without scanning blobs. Git tools track gix
walk cost; Fjall-backed tools dominate only on enormous histories.


Configuration

Full config lives at schema/basemind-config-v1.schema.json. Minimal example:

# .basemind/basemind.toml
file_watch_glob = "**/*.{rs,ts,tsx,py,go}"
eager_l2 = true

[documents]
enabled = true

Per-query MCP overrides:

{
  "query": "what does kreuzberg do?",
  "reranker_enabled": true,
  "reranker_preset": "bge-reranker-base"
}

Environment variables map mechanically: --llm-api-keyBASEMIND_LLM_API_KEY. Every MCP tool
accepts per-query overrides that win over file/env/CLI layers.


Architecture

source files
  → tree-sitter parsers (300+ langs, pack name dispatch)
  → L1 outlines + L2 calls + L3 structural hash blobs (content-addressed)
  → Fjall LSM inverted index (symbols / calls / imports / impls)
  → MCP server (rmcp) + documents pipeline (kreuzberg) → LanceDB
  → 32 MCP tools across 8 coding-agent harnesses
  • Scanner (src/scanner.rs) — rayon-parallel walker over the gitignore-aware file set.
    Extracts L1 (symbols + imports + implementations), L2 (calls + docs), L3 (structural hashes)
    per file.
  • Content-addressed blobs (src/store.rs) — msgpack at
    .basemind/blobs/<blake3>.{l1,l2,l3}.msgpack. Two files with identical content share the
    same blob.
  • Inverted index (src/index/) — Fjall LSM keyspace at
    .basemind/views/<view>/index.fjall/. Nine partitions drive symbol search, references,
    implementations, and dependents.
  • MCP surface (src/mcp/) — stdio JSON-RPC via rmcp. Tool descriptions are routing surface
    for agents; semantics stated honestly (substring vs prefix, scope-aware vs name-only, capped).
  • Git layer (src/git.rs, src/git_cache.rs) — gix-backed blame, log, diff, status.
    Sha-keyed disk cache makes warm queries free.

Installation

Channel Command Platforms Features
Homebrew brew install Goldziher/tap/basemind macOS, Linux base
npm npm install -g basemind any Node 14+ platform base
pip pip install basemind any Python 3.8+ platform base
cargo cargo install basemind --locked any Rust platform base
cargo (full) cargo install basemind --features full --locked any Rust platform documents + memory + crawl
GH releases Download binary from releases macOS · Linux · Windows base

Harness-specific setup

Harness Install command
Claude Code /plugin marketplace add Goldziher/basemind then /plugin install basemind@basemind
Cursor See Cursor docs for plugin install flow; basemind manifest at .cursor-plugin/plugin.json
Codex CLI /plugins then search for basemind
Codex App Plugins panel → Coding category → basemind → +
Gemini CLI gemini extensions install https://github.com/Goldziher/basemind
OpenCode Add { "plugin": ["basemind-opencode@latest"] } to opencode.json
Factory Droid droid plugin --help (manifest at .claude-plugin/marketplace.json)
GitHub Copilot CLI copilot plugin --help (same manifest)
Generic MCP See "Any MCP client" section above

Differentiators

  • Content-addressed dedup — Blake3-hashed L1/L2/L3 blobs deduplicated across files and
    views. Edit a file, rescan, skip unchanged hashes.
  • Secret-masking SecretString — api_key fields redacted in Debug/Display/Serialize.
    Tracing spans and panic messages never leak the value.
  • Provenance ledger — every config value's origin tracked via ConfigSource (MCP > CLI >
    env > TOML > defaults). Audit trail for debugging.
  • Schema-driven config — Rust types in src/config/ drive
    schema/basemind-config-v1.schema.json via schemars; snapshot is asserted byte-equal.
    Config is code.
  • Zero-system-dep ONNXort-bundled ships the runtime in the binary. No
    apt install onnxruntime, no system complexity.

Project state

  • Real-OSS hardening: tests/harden.rs runs the full tool sweep against 8 upstream repos
    (ripgrep, tokio, TypeScript, React, Django, requests, gin, ripgrep-shallow) on every release.
    Canary assertions catch regressions.
  • CHANGELOG.md — release history and migration notes.
  • Contributing guide — development workflow: task setup, task check,
    task build. Pre-commit hooks via prek.
  • License: MIT

Reviews (0)

No results found