obsidian-graph
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 9 GitHub stars
Code Warn
- network request — Outbound network request in src/hub_analyzer.py
Permissions Pass
- Permissions — No dangerous permissions requested
This MCP server acts as a semantic knowledge graph for markdown and Obsidian vaults. It uses AI-powered vector embeddings and a PostgreSQL database to help you discover hidden connections, analyze note hubs, and perform deep searches within your local files.
Security Assessment
The overall risk is Low, but requires minor awareness. The tool accesses your local markdown files (read-only) to index them. There are no dangerous permissions requested, no evidence of hardcoded secrets, and it does not execute arbitrary shell commands. However, it does make an outbound network request in `src/hub_analyzer.py` and relies on external APIs (specifically Voyage AI) to generate vector embeddings. This means your note contents are sent over HTTPS to a third-party service for processing.
Quality Assessment
The project is highly transparent and actively maintained, with its most recent push occurring today. It is properly licensed under the permissive MIT license and uses modern Python (3.11+). The main drawback is low community visibility; with only 9 GitHub stars, it is a very new or niche project that has not yet been widely tested or vetted by a large user base.
Verdict
Use with caution: the code itself is safe and actively maintained, but you should be comfortable with your local note contents being sent externally to the Voyage AI API for embedding generation.
Semantic knowledge graph navigation for Obsidian or markdown vaults using AI-powered vector embeddings and PostgreSQL+pgvector
Obsidian Graph
Semantic knowledge graph engine for markdown vaults. Discovers hidden connections between notes using AI-powered vector embeddings and PostgreSQL+pgvector. Accessible to AI assistants via MCP.
Overview
Obsidian Graph builds a semantic knowledge graph of your markdown vault, discovering relationships between notes that go beyond keywords and explicit links. It embeds your notes as vectors using Voyage Context-3, stores them in PostgreSQL+pgvector, and provides tools for semantic search, multi-hop graph traversal, hub detection, and orphan analysis.
Designed for Obsidian vaults but works with any folder of markdown files. Connects to any AI app or harness compatible with the Model Context Protocol (MCP).
Features
- Semantic Search: Find notes by meaning, not just keywords
- Connection Discovery: Multi-hop BFS graph traversal to map note relationships
- Hub Analysis: Identify highly connected conceptual anchors (MOC candidates)
- Orphan Detection: Find isolated insights that need integration
- Auto-Indexing: Automatic file watching with 30-second debounce
- Superior Quality: Voyage Context-3 (1024d) vs typical 384d embeddings
Architecture
┌─ obsidian-graph container ─────────────────┐
│ │
│ MCP Client ◄──stdio──► server.py │
│ │ │
│ ┌──────┴──────┐ │
│ ▼ ▼ │
│ graph_builder hub_analyzer │
│ embedder.py file_watcher │
│ │ │ │
│ │ HTTPS │ watch │
│ ▼ ▼ │
│ Voyage AI API /vault (ro) │
│ │ │
│ │ 1024d vectors │
│ ▼ │
│ vector_store.py │
│ │ │
└──────────────────┼─────────────────────────┘
│ SQL
▼
┌─ obsidian-graph-pgvector container ────────┐
│ PostgreSQL 15 + pgvector (HNSW index) │
└────────────────────────────────────────────┘
- Embeddings: Voyage Context-3 (1024 dimensions, contextualized)
- Vector Store: PostgreSQL 15+ with pgvector HNSW indexing
- Performance: 0.9ms search (555x better than target), <2s graph building
- File Watching: Watchdog with polling mode for cloud sync compatibility
- Transport: Docker stdio for MCP communication
MCP Tools
Overview
All tools use semantic similarity via 1024-dimensional Voyage Context-3 embeddings. Similarity scores range from 0.0 (unrelated) to 1.0 (identical). Default threshold is 0.5 (clear connection).
How it works:
- Notes are embedded as vectors in 1024-dimensional space
- Cosine similarity measures semantic closeness between vectors
- HNSW index enables sub-millisecond vector search
- Results ranked by similarity score (0.0-1.0)
Tool Reference
| Tool | Purpose | Method | Performance | Use Case |
|---|---|---|---|---|
search_notes |
Semantic search across vault | Query embedding → vector search | <1ms | Find notes by concept |
get_similar_notes |
Find notes similar to given note | Note embedding → vector search | <300ms | Discover related ideas |
get_connection_graph |
Multi-hop BFS graph traversal | Recursive similarity search | <2s | Map knowledge networks |
get_hub_notes |
Identify highly connected notes | Materialized connection counts | <100ms | Find conceptual anchors |
get_orphaned_notes |
Find isolated notes | Materialized connection counts | <100ms | Unintegrated insights |
Methodology Details
search_notes:
- Generates query embedding using Voyage Context-3
- Performs cosine similarity search against all note embeddings
- Returns top-k most similar notes above threshold
- HNSW index enables O(log n) search complexity
get_similar_notes:
- Fetches source note's embedding from database
- Searches for notes with similar embeddings
- Excludes source note from results
- Useful for exploring conceptual neighborhoods
get_connection_graph:
- Uses Breadth-First Search (BFS) for level-by-level exploration
- Prevents cycles by tracking visited nodes
- Builds multi-hop network (depth 1-5 levels)
- Each level: finds top-k most similar notes from previous level
- Returns: nodes (with level), edges (with similarity), stats
get_hub_notes:
- Uses materialized
connection_countcolumn (O(1) query) - Connection count = # of notes above threshold similarity
- Background refresh when >50% of counts are stale
- Identifies notes with many semantic connections
- High hub scores → good MOC (Map of Content) candidates
get_orphaned_notes:
- Uses materialized
connection_countcolumn - Finds notes with few semantic connections
- Sorted by: connection count (ASC), modified date (DESC)
- Shows recent notes first (likely new insights)
- Helps identify notes needing integration
Chunking Support
For large notes (>30k tokens):
- Automatically split into sentence-aligned chunks (target: ~2000 characters, 0 overlap)
- Chunking algorithm breaks at sentence boundaries (
.or\n\n) for readability - Chunk sizes vary (1800-2200 chars) to preserve sentence integrity
- Embedded in batches of 60 chunks (preserves context)
- Voyage Context-3 maintains semantic coherence across chunks
- Each chunk stored separately with
chunk_index - Search returns individual chunks (can aggregate by path)
Example: 168k-char note → ~87 variable-sized chunks → 2 batches (60+27) → context preserved
Most Obsidian notes are <10k tokens and embedded whole (single chunk).
Prerequisites
Voyage AI Account Setup
This server requires a Voyage AI API key for generating embeddings:
- Create account: Sign up at https://www.voyageai.com/
- Get API key: Visit https://dashboard.voyageai.com/ → API Keys → Create new key
- Add payment method (Important!):
- Go to https://dashboard.voyageai.com/billing
- Add a payment method (credit card)
- Why: Without payment, rate limit is only 3 RPM (unusable)
- With payment: 300 RPM rate limit unlocked
- Free tier: Voyage Context-3 includes 200M free tokens (one-time per account):
- First 200M tokens are FREE
- Sufficient for indexing ~50,000 notes
- After free tier: ~$0.12 per 1M tokens
Cost estimate: Indexing 1,000 notes ≈ 4M tokens ≈ $0.48 (or free if within 200M token limit)
Installation
- Clone the repository:
git clone https://github.com/drewburchfield/obsidian-graph-mcp.git
cd obsidian-graph-mcp
- Configure environment:
cp .env.example .env
# Edit .env with your settings:
# - VOYAGE_API_KEY (from https://dashboard.voyageai.com/)
# - OBSIDIAN_VAULT_PATH (absolute path to your vault)
# - POSTGRES_PASSWORD (generate with: openssl rand -base64 36)
- Start services:
docker-compose up -d
- Initial indexing (first time only):
docker exec -i obsidian-graph python -m src.indexer
Indexes entire vault (30-60 min for large vaults). After this, file watching handles incremental updates.
- Add to MCP client (
~/.mcp.json):
{
"mcpServers": {
"obsidian-graph": {
"command": "docker",
"args": ["exec", "-i", "obsidian-graph", "python", "-m", "src.server"],
"disabled": false
}
}
}
Configuration
Required Environment Variables
# Voyage AI
VOYAGE_API_KEY=your_key_here # Get from https://www.voyageai.com/
# PostgreSQL (POSTGRES_HOST is set by docker-compose.yml, no need to set in .env)
POSTGRES_PASSWORD=your_secure_password_here # Generate with: openssl rand -base64 36
# Obsidian Vault
OBSIDIAN_VAULT_PATH=/path/to/your/vault # Absolute path on your system
Optional Tuning
# File watching
OBSIDIAN_WATCH_ENABLED=true
OBSIDIAN_DEBOUNCE_SECONDS=30
# Polling mode (auto-enabled for Docker and cloud-synced vaults)
# OBSIDIAN_WATCH_USE_POLLING= # true | false (unset = auto-detect)
# OBSIDIAN_WATCH_POLLING_INTERVAL=30 # seconds between polls (default: 30)
# Performance
POSTGRES_MIN_CONNECTIONS=5
POSTGRES_MAX_CONNECTIONS=20
EMBEDDING_BATCH_SIZE=128
EMBEDDING_REQUESTS_PER_MINUTE=300
# HNSW index (advanced)
HNSW_M=16
HNSW_EF_CONSTRUCTION=64
Cloud Sync Support (iCloud, Google Drive, Dropbox, OneDrive)
If your Obsidian vault is stored in a cloud-synced folder, the file watcher automatically uses polling mode for reliable change detection. This is because Docker's filesystem events don't propagate reliably through cloud sync virtualization layers.
Auto-detection: Polling mode is automatically enabled when:
- Running inside Docker (always uses polling for reliability)
- Vault path contains cloud sync patterns (
Library/Mobile Documents,Library/CloudStorage, etc.)
How it works:
- Polling mode compares directory snapshots every 30 seconds (configurable)
- Detects file creates, modifications, moves, and deletions
- Slightly higher CPU than native filesystem events, but works reliably everywhere
Mobile workflow: Edit notes on mobile (iOS/Android) via Obsidian's iCloud/Google Drive sync. Changes sync to your Mac, and the polling watcher detects them within the polling interval.
Override behavior:
# Force polling on (for edge cases)
OBSIDIAN_WATCH_USE_POLLING=true
# Force native events (may miss changes with cloud sync)
OBSIDIAN_WATCH_USE_POLLING=false
# Faster detection (higher CPU)
OBSIDIAN_WATCH_POLLING_INTERVAL=15
Excluding Folders from Indexing
By default, the indexer excludes common system and tool folders:
.obsidian//.trash//.Trash/(Obsidian system).git//.github/(version control).vscode//.cursor/(editor config).claude//.aider//.smart-env/(AI tools)
Custom Exclusions: To exclude additional folders (like a soft-delete folder), create .obsidian-graph.conf in your vault root:
# Exclude soft delete folder
07_Archive/Trash/
# Exclude drafts
drafts/
See .obsidian-graph.conf.example for more patterns and examples.
Pattern Syntax:
| Pattern | Matches |
|---|---|
folder/ |
All files in folder/ and subfolders |
drafts/* |
All files directly in drafts/ |
*.tmp.md |
All files ending in .tmp.md |
Security
This server implements multiple security layers to protect your vault:
- Path Traversal Protection: Validates all file paths stay within vault (
src/security_utils.py) - Input Validation: All parameters validated before processing (
src/validation.py) - Secure Credentials: Random generated database passwords (
scripts/generate-db-password.sh) - Container Isolation: Read-only vault mount, dropped capabilities, non-root user
Concurrency: See docs/CONCURRENCY.md for thread-safety guarantees and race condition prevention.
Running Security Tests
# Security tests
pytest tests/test_security*.py -v
# Input validation tests
pytest tests/test_validation.py -v
# Race condition tests
pytest tests/test_race_conditions.py -v
# All tests with coverage
pytest tests/ --cov=src --cov-report=html
Usage Examples
Semantic Search
search_notes(query="neural networks and consciousness", limit=10, threshold=0.5)
Returns notes semantically related to the query, even if they don't contain
the exact keywords.
Find Similar Notes
get_similar_notes(note_path="neuroscience/dopamine.md", limit=10, threshold=0.6)
Discovers notes conceptually similar to dopamine note (might find:
reward-systems.md, motivation.md, decision-making.md)
Build Connection Graph
get_connection_graph(
note_path="philosophy/free-will.md",
depth=3,
max_per_level=5,
threshold=0.65
)
Maps 3-level network showing how free-will connects to neuroscience,
psychology, and ethics notes through semantic similarity.
Identify Hubs
get_hub_notes(min_connections=10, threshold=0.5, limit=20)
Finds notes with >=10 connections - candidates for Maps of Content (MOCs).
Example: "decision-making.md" might connect to psychology, neuroscience,
economics, and philosophy notes.
Find Orphans
get_orphaned_notes(max_connections=2, limit=20)
Identifies isolated notes that need integration into knowledge graph.
Sorted by modification date to surface recent unconnected insights.
Performance
Validated metrics:
| Metric | Target | Actual | Status |
|---|---|---|---|
| Search latency | <500ms | 0.9ms | ✅ 555x better |
| Graph building (depth=3) | <2s | <2s | ✅ On target |
| Hub/orphan queries | <100ms | <100ms | ✅ Materialized |
| Similarity range | [0.0-1.0] | [0.0-1.0] | ✅ Validated |
| Embedding quality | 1024-dim | 1024-dim | ✅ Voyage Context-3 |
Performance Note: Metrics measured on development vault (~500 notes, M1 MacBook Pro). Actual performance depends on vault size, hardware (CPU/RAM/SSD), and database configuration. HNSW indexing provides O(log n) search, so performance degrades gracefully with vault size.
Troubleshooting
"Reduced rate limits of 3 RPM"
- Cause: No payment method on Voyage account
- Solution: Add payment method at https://dashboard.voyageai.com/
- Note: 200M free tokens still apply
"PostgreSQL connection failed"
# Check postgres container
docker ps | grep obsidian-graph-pgvector
docker logs obsidian-graph-pgvector
# Verify credentials
grep POSTGRES_ .env
"Note not found" errors
- Ensure initial indexing completed:
docker exec -i obsidian-graph python -m src.indexer - Check vault path is mounted:
docker exec -i obsidian-graph ls /vault
File changes not detected
- Verify
OBSIDIAN_WATCH_ENABLED=true - Check logs:
docker logs obsidian-graph - Look for:
Watching vault: /vault [polling (interval: 30s)] - File watcher starts after PostgreSQL connection
- Cloud sync users: Changes take up to polling interval (default 30s) plus cloud sync time
- Reduce detection time: Set
OBSIDIAN_WATCH_POLLING_INTERVAL=15in.env
Development
Running Tests
# Quick validation
docker exec -i obsidian-graph python test_e2e.py
# Unit tests (requires 300 RPM rate limits)
docker exec -i obsidian-graph pytest tests/ -v
Rebuilding
docker-compose build obsidian-graph
docker-compose restart obsidian-graph
Debugging
# View logs
docker logs -f obsidian-graph
# Interactive shell
docker exec -it obsidian-graph /bin/bash
# Check database
docker exec -it obsidian-graph-pgvector psql -U obsidian -d obsidian_graph
Comparison to mcp-obsidian
| Feature | mcp-obsidian | obsidian-graph |
|---|---|---|
| Embeddings | 384-dim (all-MiniLM-L6-v2) | 1024-dim (Voyage Context-3) |
| Vector Store | ChromaDB | PostgreSQL+pgvector |
| Tools | 2 (search, reindex) | 5 (search, similar, graph, hubs, orphans) |
| Search perf | Unknown | 0.9ms validated |
| Graph traversal | ❌ No | ✅ BFS with cycle prevention |
| Hub detection | ❌ No | ✅ Materialized stats |
License
MIT License - Copyright (c) 2025 Drew Burchfield
See LICENSE file for details.
Links
- Voyage AI: https://www.voyageai.com/
- pgvector: https://github.com/pgvector/pgvector
- MCP Protocol: https://modelcontextprotocol.io/
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found