swarmvault

mcp
Guvenlik Denetimi
Basarisiz
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Basarisiz
  • fs module — File system access in .github/workflows/live-smoke.yml
  • rm -rf — Recursive force deletion command in packages/engine/package.json
Permissions Gecti
  • Permissions — No dangerous permissions requested
Purpose
This MCP server acts as a local-first knowledge base for AI agents. It compiles raw files, URLs, and PDFs into a persistent markdown wiki and a structured local search index, allowing AI coding assistants to retain and query research over time.

Security Assessment
Overall risk: Medium. The tool natively interacts with the local file system to ingest documents and build its database, which is standard for its purpose. However, the codebase contains a recursive force deletion command (`rm -rf`) inside its package scripts, which is a potential hazard if path variables are ever mishandled. No hardcoded secrets or overly broad permissions were found. Because it integrates with multiple external AI providers (OpenAI, Anthropic, Ollama, etc.), it inherently makes network requests, meaning your ingested data and queries will leave your local machine depending on the configured provider.

Quality Assessment
The project is actively maintained, with its most recent code pushed today. It uses the highly permissive MIT license and is excellently documented. However, community trust and visibility are currently very low. With only 5 GitHub stars, the tool is effectively in its early stages and has not yet been widely tested or vetted by the broader developer community.

Verdict
Use with caution. The tool is active and well-documented, but low community adoption and unsafe deletion scripts in the build process mean you should review the code and test it in a secure environment before integrating it into critical workflows.
SUMMARY

Local-first LLM knowledge base compiler (Claude Code, Codex, Ollama). Turn raw research into a persistent markdown wiki, knowledge graph, and search index that compound over time.

README.md

SwarmVault

A local-first knowledge compiler for AI agents.

SwarmVault turns raw files, URLs, screenshots, PDFs, saved answers, and code into a durable working vault. Instead of losing useful work inside chat history, you get a reviewable markdown wiki, a structured graph, a local search index, session logs, and saved outputs that stay on disk.

It is built for the compounding loop most coding agents still miss:

  1. ingest source material into a local workspace
  2. compile it into a schema-shaped wiki and graph
  3. query or explore the vault
  4. save the useful results back into the vault
  5. review changes instead of trusting silent rewrites

Every vault carries a user-editable swarmvault.schema.md file, so the compiler and query layer can learn how that specific vault should be organized.

swarmvault init --obsidian
swarmvault add https://arxiv.org/abs/2401.12345
swarmvault ingest ./notes
swarmvault compile
swarmvault benchmark
swarmvault query "What is the auth flow?"
swarmvault graph query "How does auth connect to billing?"
swarmvault watch status
swarmvault watch --repo --once
swarmvault hook install
swarmvault graph serve
my-vault/
├── swarmvault.schema.md
├── raw/                   immutable source files and localized assets
├── wiki/                  compiled source, concept, entity, code, output, and graph pages
├── state/                 graph.json, search.sqlite, sessions, approvals, schedules
├── .obsidian/             optional local workspace config
└── agent/                 generated agent-facing helpers

What You Get

  • A markdown-first wiki that stays usable in Obsidian or plain Git
  • A structured graph artifact with provenance, freshness, projects, and saved outputs
  • Graph-first report pages plus deterministic local graph query, path, explain, and god-node tools
  • Benchmark artifacts plus worked examples for measuring graph-guided context reduction
  • Save-first query and explore workflows, including report, slides, chart, and image outputs
  • Reviewable approval and candidate queues instead of silent page mutation
  • Local full-text search and a graph workspace for graph, search, preview, and review
  • Project-aware schemas and rollups for larger multi-root vaults
  • Repo-aware code ingestion with parser-backed module analysis and local import resolution
  • Repo-aware watch mode and local git hooks for post-commit and post-checkout refreshes
  • Pending semantic refresh tracking for non-code repo changes, surfaced in watch status and the local graph workspace
  • Human-only wiki/insights/ pages that SwarmVault can read but does not rewrite
  • Session artifacts for compile, query, explore, lint, watch, and schedule runs
  • CLI, MCP, and installable agent instructions for Codex, Claude Code, Cursor, Goose, Pi, Gemini CLI, and OpenCode
  • Pluggable providers including OpenAI, Anthropic, Gemini, Ollama, OpenRouter, Groq, Together, xAI, Cerebras, generic OpenAI-compatible APIs, and custom adapters

How It Works

SwarmVault is not a “chat with your docs” wrapper. The vault itself is the product.

  • Ingest stores raw inputs immutably and localizes remote assets when needed.
  • Compile turns those inputs into durable source, concept, entity, code, and output pages.
  • Query and explore write useful results back into wiki/outputs/ by default.
  • Review and candidate queues keep generated changes inspectable before promotion.
  • Search, graph serving, scheduling, watch mode, and MCP expose the same local artifacts instead of creating a second hidden system.

The extraction layer is intentionally split:

  • deterministic parsing and source analysis where the runtime can do it locally
  • provider-backed synthesis where a vault-specific schema, cross-source reasoning, or advisory linting actually benefits from a model

That keeps the durable artifacts inspectable and lets the vault improve over time instead of resetting every session.

Install

SwarmVault requires Node >=24.

npm install -g @swarmvaultai/cli

This installs the swarmvault command. The vault alias is also available for compatibility.

Quickstart

mkdir my-vault
cd my-vault
swarmvault init --obsidian
sed -n '1,120p' swarmvault.schema.md
swarmvault ingest ./notes.md
swarmvault ingest https://example.com/article
swarmvault add https://arxiv.org/abs/2401.12345
swarmvault compile
swarmvault benchmark
swarmvault query "What are the main ideas?"
swarmvault query "Turn this into slides" --format slides
swarmvault query "Show this as a chart" --format chart
swarmvault explore "What should I investigate next?" --steps 3
swarmvault lint --deep
swarmvault schedule list
swarmvault review list
swarmvault candidate list
swarmvault graph query "Which nodes bridge the largest communities?"
swarmvault graph path "module:src/auth.ts" "concept:billing"
swarmvault graph explain "concept:billing"
swarmvault graph god-nodes
swarmvault graph serve
swarmvault graph export --html ./exports/graph.html
swarmvault graph export --graphml ./exports/graph.graphml

You can also use the capture and automation loop:

swarmvault inbox import
swarmvault watch status
swarmvault watch --lint --repo
swarmvault hook install

And you can expose the vault to compatible agents over MCP:

swarmvault mcp

Platform Support

Agent Install target
Codex swarmvault install --agent codex
Claude Code swarmvault install --agent claude
Cursor swarmvault install --agent cursor
Goose swarmvault install --agent goose
Pi swarmvault install --agent pi
Gemini CLI swarmvault install --agent gemini
OpenCode swarmvault install --agent opencode

Codex, Goose, Pi, and OpenCode share the same canonical AGENTS.md managed block. Claude Code uses CLAUDE.md, Gemini CLI uses GEMINI.md, and Cursor writes .cursor/rules/swarmvault.mdc.

If you want Claude Code to bias toward SwarmVault's graph-orientation pages before broad file search, install it with:

swarmvault install --agent claude --hook

Workspace Layout

After swarmvault init, the workspace looks like this:

my-vault/
|-- swarmvault.config.json
|-- swarmvault.schema.md
|-- inbox/
|-- raw/
|   |-- sources/
|   `-- assets/
|-- wiki/
|   |-- index.md
|   |-- graph/
|   |-- log.md
|   |-- candidates/
|   |-- code/
|   |-- insights/
|   |-- projects/
|   |-- sources/
|   |-- concepts/
|   |-- entities/
|   `-- outputs/
|       |-- assets/
|       `-- index.md
|-- state/
|   |-- manifests/
|   |-- extracts/
|   |-- analyses/
|   |-- code-index.json
|   |-- graph.json
|   |-- search.sqlite
|   |-- sessions/
|   |-- approvals/
|   |-- schedules/
|   `-- jobs.ndjson
|-- .obsidian/
`-- agent/

Schema Layer

Every vault carries a root schema file:

swarmvault.schema.md

This is a markdown instruction layer, not a separate DSL. SwarmVault reads that file during compile and query so each vault can define its own:

  • naming rules
  • concept and entity categories
  • relationship expectations
  • grounding and citation requirements
  • exclusions and scope boundaries

Generated pages include a schema_hash in frontmatter, which lets lint mark pages stale when the schema changes.

Generated source, concept, entity, output, and index pages also carry lifecycle fields such as status, created_at, updated_at, compiled_from, and managed_by.

Core Commands

  • swarmvault init [--obsidian]: create a workspace, default config, default schema file, and optional .obsidian/ config
  • swarmvault ingest <input> [--repo-root <path>] [--include <glob...>] [--exclude <glob...>] [--max-files <n>] [--no-gitignore] [--no-include-assets] [--max-asset-size <bytes>]: ingest a local file path, directory path, or URL, and localize remote image references by default when the input is a URL
  • swarmvault add <url> [--author <name>] [--contributor <name>]: capture arXiv/X URLs into normalized markdown, or fall back to generic URL ingest
  • swarmvault inbox import [dir]: import browser-clipper style bundles and inbox captures
  • swarmvault compile [--approve]: build wiki pages, graph data, and the search index using the vault schema as guidance, or stage a review bundle before applying changes
  • swarmvault benchmark [--question "<text>" ...]: measure graph-guided context reduction and write state/benchmark.json
  • swarmvault query "<question>" [--no-save] [--format markdown|report|slides|chart|image]: answer questions against the compiled vault and save the result by default
  • swarmvault explore "<question>" [--steps <n>] [--format markdown|report|slides|chart|image]: run a save-first multi-step research loop and write a hub page plus step outputs
  • swarmvault lint [--deep] [--web]: run structural lint, optional LLM-powered deep lint, and optional web-augmented evidence gathering
  • swarmvault schedule list|run|serve: run configured recurring jobs for compile, lint, query, and explore
  • swarmvault watch [--lint] [--repo] [--once]: watch the inbox, optionally refresh tracked repo roots, or run a one-shot refresh cycle
  • swarmvault watch status: show watched repo roots plus pending semantic refresh entries for tracked non-code changes
  • swarmvault hook install|uninstall|status: manage local git hooks that run repo-aware one-shot refreshes after checkout and commit
  • swarmvault mcp: start a local MCP server over stdio
  • swarmvault review list|show|accept|reject: inspect and resolve staged approval bundles
  • swarmvault candidate list|promote|archive: inspect and resolve staged concept and entity candidates
  • swarmvault graph query "<question>" [--dfs] [--budget <n>]: run a deterministic local graph traversal seeded from local search
  • swarmvault graph path <from> <to>: return the shortest high-confidence path between two graph targets
  • swarmvault graph explain <target>: inspect graph metadata, community membership, neighbors, and provenance for a node or page
  • swarmvault graph god-nodes [--limit <n>]: list the most connected bridge-heavy nodes in the current graph
  • swarmvault graph serve: open the local graph workspace with graph, search, and page preview
  • swarmvault graph export --html|--svg|--graphml|--cypher <output>: export the graph workspace as HTML, SVG, GraphML, or Cypher
  • swarmvault install --agent codex|claude|cursor|goose|pi|gemini|opencode: install agent-specific rules

Human-authored insight pages placed in wiki/insights/ are indexed into search and exposed to query, but SwarmVault does not rewrite them after initialization.

When ingest targets a remote HTML or markdown URL, SwarmVault downloads referenced remote images into raw/assets/<sourceId>/, rewrites the stored markdown to local relative links, and records those files as manifest attachments. Use --no-include-assets to keep remote image references untouched, or --max-asset-size to cap the bytes fetched for a single remote asset.

When ingest targets a local directory, SwarmVault walks the tree recursively, respects .gitignore by default, records repoRelativePath on matching manifests, and later writes state/code-index.json during compile so local imports can resolve across the code graph.

Code-aware ingestion currently ships for JavaScript, TypeScript, Python, Go, Rust, Java, C#, C, C++, PHP, Ruby, and PowerShell. JavaScript and TypeScript use the TypeScript compiler API; the other shipped languages use parser-backed local analyzers that emit the same module-page and graph model.

Compounding Loop

SwarmVault is designed so useful work compounds:

  • query writes output pages into wiki/outputs/ by default
  • query --no-save keeps the answer ephemeral
  • saved outputs are indexed immediately into search and the graph page registry
  • saved outputs immediately refresh related source, concept, and entity pages
  • chart and image saves also write local assets into wiki/outputs/assets/<slug>/
  • compile also writes wiki/graph/report.md, wiki/graph/index.md, and per-community graph summary pages
  • new concept and entity pages land in wiki/candidates/ first, then promote on the next matching compile
  • review turns compile --approve bundles into a local accept/reject workflow instead of a dead-end staging directory
  • candidate lets you promote or archive staged concept and entity pages without waiting for another compile
  • explore chains several saved queries together and writes a hub page you can revisit
  • scheduled query and explore jobs stage saved output pages through approvals instead of activating them immediately
  • lint --deep can suggest missing citations, coverage gaps, candidate pages, and follow-up questions without mutating the vault
  • orchestration roles can add audit, safety, context, and research feedback without bypassing the approval flow
  • compile, query, explore, lint, and watch each write a session artifact to state/sessions/
  • ingest and inbox import also append to the canonical wiki/log.md activity log

Why This Exists

Most "chat with your docs" tools answer a question and then throw away the work. SwarmVault treats the vault itself as the product. The markdown pages, saved outputs, graph edges, manifests, schema rules, and freshness state are durable artifacts you can inspect, diff, and keep improving.

Providers

SwarmVault routes by capability, not by brand name.

Built-in provider types:

  • heuristic
  • openai
  • anthropic
  • gemini
  • ollama
  • openrouter
  • groq
  • together
  • xai
  • cerebras
  • openai-compatible
  • custom

Example provider config:

{
  "providers": {
    "primary": {
      "type": "openai-compatible",
      "baseUrl": "https://your-provider.example/v1",
      "apiKeyEnv": "OPENAI_API_KEY",
      "model": "gpt-4.1-mini",
      "apiStyle": "chat",
      "capabilities": ["chat", "structured", "image_generation"]
    }
  },
  "tasks": {
    "compileProvider": "primary",
    "queryProvider": "primary",
    "lintProvider": "primary",
    "visionProvider": "primary",
    "imageProvider": "primary"
  }
}

Web Search For Deep Lint

swarmvault lint --deep --web uses a separate webSearch config block instead of the normal LLM provider registry.

{
  "webSearch": {
    "providers": {
      "evidence": {
        "type": "http-json",
        "endpoint": "https://search.example/api/search",
        "method": "GET",
        "apiKeyEnv": "SEARCH_API_KEY",
        "apiKeyHeader": "Authorization",
        "apiKeyPrefix": "Bearer ",
        "queryParam": "q",
        "limitParam": "limit",
        "resultsPath": "results",
        "titleField": "title",
        "urlField": "url",
        "snippetField": "snippet"
      }
    },
    "tasks": {
      "deepLintProvider": "evidence"
    }
  }
}

If --web is requested without a configured web-search provider, SwarmVault fails clearly instead of silently skipping evidence gathering.

Packages

  • @swarmvaultai/cli: the globally installable CLI
  • @swarmvaultai/engine: the runtime library behind ingest, compile, query, lint, watch, and MCP
  • @swarmvaultai/viewer: the graph viewer package used by swarmvault graph serve

Current Notes

  • The default heuristic provider is meant for local smoke tests and offline defaults, not serious synthesis quality
  • The local search layer uses Node's built-in node:sqlite, which may emit an experimental warning in Node 24
  • The graph viewer is included in the CLI flow; users do not need to install the viewer package separately

Development

pnpm install
pnpm lint
pnpm test
pnpm build

Live smoke checks for the published package:

pnpm live:smoke:heuristic
pnpm live:smoke:ollama
OPENAI_API_KEY=... pnpm live:smoke:openai

The heuristic published-package smoke lane validates saved visual outputs, project-aware code ingestion, add capture fallback, benchmark artifacts, graph report generation, standalone graph exports (html, svg, graphml, cypher), review-staged scheduled query runs, watch automation plus watch status, richer graph workspace APIs, and MCP graph/query surfaces against the real npm install path.

See docs/live-testing.md for the published-package smoke flow, CI workflow, and the manual live checklist.

Worked Examples

Small example vaults live under worked/ and mirror the trust and capture workflows used in docs and smoke tests:

  • worked/code-repo/
  • worked/mixed-corpus/
  • worked/capture/

Links

License

MIT

Yorumlar (0)

Sonuc bulunamadi