samyama-graph

mcp
SUMMARY

High-performance graph-vector database in Rust. OpenCypher, HNSW vector search, 14 graph algorithms, cost-based planner. 66M nodes / 1B edges on commodity hardware.

README.md

Samyama Graph Database

Version
Rust
Tests
Coverage
Bugs
Vulnerabilities
Quality Gate
Maintainability
LOC
License

Samyama (Sanskrit: the union of focused query, sustained analysis, and unified insight) is a high-performance graph-vector database written in Rust. It combines a property graph engine (~90% OpenCypher), vector search (HNSW), graph algorithms, and natural language querying in a single binary. Zero GC pauses, 1,814 tests, Apache-2.0.

Scale Benchmark

Dataset Nodes Edges RAM Instance Cost
PubMed/MEDLINE 66.2M 1.04B 130 GB $2.50 total
Biomedical Trifecta 7.9M 28.0M 2 GB Mac Mini M4
Cricket (Cricsheet) 37K 1.39M < 1 GB Mac Mini M4

See it in action

Graph Simulation — Cricket KG (36K nodes, 1.4M edges) with live activity particles

Samyama Graph Simulation

Click for full demo (1:56) — Dashboard, Cypher Queries, and Graph Simulation

LDBC Benchmark Results (v0.6.0, Mac Mini M4)

Benchmark Queries Pass Rate Dataset
SNB Interactive 21 reads 21/21 (100%) SF1: 3.18M nodes, 17.26M edges
SNB Business Intelligence 20 analytical 16/16 run (100%) (BI-17+ timeout) SF1 (same dataset)
Graphalytics 6 algorithms x 2 datasets 12/12 (100%) LDBC XS reference graphs
FinBench 12 CR + 6 SR + 3 RW + 19 W 40/40 (100%) Synthetic: 7.7K nodes, 42.2K edges

See docs/ldbc/ for detailed per-query results, latency tables, and analysis.

What's New in v0.6.1

  • HTTP Tenant Management API: Full CRUD for tenants via REST endpoints (POST /api/tenants, GET /api/tenants, GET /api/tenants/{id}, DELETE /api/tenants/{id}).
  • samyama-mcp-serve: Auto-generate MCP (Model Context Protocol) servers from any graph schema. Discovers labels, edge types, and properties, then generates typed tools for AI agents. Install via pip install samyama[mcp] and run samyama-mcp-serve --demo for instant agent tool access.
  • Snapshot format (.sgsnap): Portable gzip JSON-lines snapshot export/import for graph tenants, enabling backup and migration across instances.
  • Cricket dataset loader: Load 21K Cricsheet T20/ODI/Test matches (36K nodes, 1.4M edges) via cargo run --release --example cricket_loader.
  • AACT clinical trials loader: Full AACT dataset loader for clinical trial analysis (575K studies, 7.7M nodes, 27M edges).
  • Index scan fix: Inline MATCH properties {prop: val} now trigger IndexScan when a matching index exists, avoiding full label scans.

Key Features

  • OpenCypher Query Engine: ~90% OpenCypher coverage — MATCH, CREATE, DELETE, SET, MERGE, OPTIONAL MATCH, UNION, WITH, UNWIND, aggregations, and 30+ built-in functions.
  • RESP Protocol: Drop-in compatibility with any Redis client (redis-cli, Jedis, ioredis).
  • Vector Search: Built-in HNSW indexing for millisecond semantic search and Graph RAG.
  • NLQ (Natural Language Queries): Ask questions in plain English — the LLM translates to Cypher automatically.
  • Graph Algorithms: Native PageRank, BFS, Dijkstra, WCC, SCC, CDLP, LCC, MaxFlow, MST, SSSP, Triangle Counting.
  • Optimization Solvers: 15+ metaheuristic algorithms (Jaya, Rao, GWO, PSO, Firefly, Cuckoo, ABC, NSGA-II) for in-database optimization.
  • Multi-Tenancy: Tenant-level isolation with per-tenant quotas via RocksDB column families.
  • High Availability: Raft consensus (via openraft) for cluster replication and automatic failover.
  • Persistence: RocksDB storage with Write-Ahead Log and checkpointing.
  • Cost-Based Query Planner: Graph-native plan enumeration with triple-level statistics (GraphCatalog), predicate pushdown, direction reversal, ExpandInto, and plan cache. EXPLAIN/PROFILE for plan visualization.
  • Late Materialization: Scan operators produce lightweight NodeRef tokens instead of full clones — 4x traversal speedup on multi-hop queries.
  • Two-Phase Bulk Loading: create_node_stub() + create_edge_stub() reduce per-edge memory from 709 to 52 bytes (13.6x), enabling billion-edge graphs on commodity hardware.
  • HTTP Tenant API: REST endpoints for tenant CRUD (create, list, get, delete) alongside the RESP protocol.
  • MCP Server Generation: Auto-generate MCP servers from graph schema for AI agent integration (samyama-mcp-serve).
  • Snapshot Persistence: Portable .sgsnap format with automatic persistence — imported snapshots survive server restart.

Getting Started

Build

git clone https://github.com/samyama-ai/samyama-graph
cd samyama-graph
cargo build --release

Run the Server

./target/release/samyama

This starts the RESP server on port 6379 and the HTTP API on port 8080.

Connect

redis-cli -p 6379

# Create nodes
GRAPH.QUERY mygraph "CREATE (n:Person {name: 'Alice', age: 30})"

# Query
GRAPH.QUERY mygraph "MATCH (n:Person) RETURN n"

# Explain a query plan
GRAPH.QUERY mygraph "EXPLAIN MATCH (n:Person) WHERE n.age > 25 RETURN n"

Examples

Samyama ships with domain-specific demos that showcase the full feature set.

Core Infrastructure

Example Command Description
Persistence cargo run --example persistence_demo RocksDB persistence, WAL, multi-tenancy, recovery
Cluster cargo run --example cluster_demo 3-node Raft cluster with leader election and failover
Full Benchmark cargo run --example full_benchmark Scale test up to 1M+ nodes

Industry Demos (with NLQ + Agentic Enrichment)

Each demo builds a domain-specific knowledge graph, runs Cypher queries, executes graph algorithms, and demonstrates natural language querying via the NLQ pipeline.

Example Command What it demonstrates
Banking / Fraud Detection cargo run --example banking_demo Customer segmentation, fraud patterns, money laundering detection, OFAC screening
Clinical Trials cargo run --example clinical_trials_demo Patient-trial matching (vector search), drug interactions (PageRank), site optimization (NSGA-II)
Supply Chain cargo run --example supply_chain_demo Disruption analysis, cold-chain monitoring, port optimization (Jaya), alternative suppliers (vector search)
Smart Manufacturing cargo run --example smart_manufacturing_demo Digital twin, failure cascade analysis, production scheduling (Cuckoo Search), energy optimization
Social Network cargo run --example social_network_demo Follower graphs, mutual connections, influence analysis (PageRank), community detection (WCC)
Knowledge Graph cargo run --example knowledge_graph_demo Document lineage, expert finding (vector search), topic clustering, knowledge hub identification
Enterprise SOC cargo run --example enterprise_soc_demo Threat intel, MITRE ATT&CK mapping, attack path analysis (Dijkstra), lateral movement simulation
Agentic Enrichment cargo run --example agentic_enrichment_demo Generation-Augmented Knowledge (GAK) — LLM generates Cypher to enrich the graph autonomously

Data Loaders

Example Command Description
LDBC SNB cargo run --example ldbc_loader Load LDBC SNB SF1 dataset (3.18M nodes, 17.26M edges)
FinBench cargo run --example finbench_loader Load/generate LDBC FinBench dataset
Cricket cargo run --release --example cricket_loader Load 21K Cricsheet matches (36K nodes, 1.4M edges)
AACT Clinical Trials cargo run --release --example aact_loader Full AACT dataset (575K studies, 7.7M nodes, 27M edges)

AI Agent Integration

Example Command Description
MCP Server samyama-mcp-serve --demo Auto-generate MCP server from graph schema for AI agents (Python, pip install samyama[mcp])

Cypher Support

~90% OpenCypher coverage. See docs/CYPHER_COMPATIBILITY.md for the full matrix.

Supported Clauses

MATCH, OPTIONAL MATCH, WHERE, RETURN, RETURN DISTINCT, ORDER BY, SKIP, LIMIT, CREATE, DELETE, DETACH DELETE, SET, REMOVE, MERGE (with ON CREATE SET / ON MATCH SET), WITH, UNWIND, UNION / UNION ALL, EXPLAIN, EXISTS subqueries

Supported Functions

Category Functions
String toUpper, toLower, trim, replace, substring, left, right, reverse, toString
Numeric abs, ceil, floor, round, sqrt, sign, toInteger, toFloat
Aggregation count, sum, avg, min, max, collect
List/Collection size, length, head, last, tail, keys, range
Graph id, labels, type, exists, coalesce, startsWith, endsWith, contains

Operators

Arithmetic (+, -, *, /, %), comparison (=, <>, <, >, <=, >=), logical (AND, OR, NOT, XOR), string (STARTS WITH, ENDS WITH, CONTAINS, =~), null (IS NULL, IS NOT NULL), list (IN).

Cross-type coercion: Integer/Float promotion and String/Boolean coercion for LLM-generated queries. Null propagation follows Neo4j three-valued logic.

Architecture

src/
├── graph/           # Property graph model (Node, Edge, PropertyValue, GraphStore)
├── query/           # OpenCypher engine
│   ├── cypher.pest  #   PEG grammar (Pest)
│   ├── parser.rs    #   Parser → AST
│   └── executor/    #   Volcano iterator model (scan, filter, expand, project, aggregate, sort, limit)
├── protocol/        # RESP3 server (Tokio TCP)
├── persistence/     # RocksDB + WAL + multi-tenancy
├── raft/            # Raft consensus (openraft)
├── nlq/             # Natural Language Query pipeline (OpenAI, Gemini, Ollama, Claude Code)
├── vector/          # HNSW vector index
├── snapshot/        # Portable .sgsnap export/import
└── sharding/        # Tenant-level sharding

Key design decisions are documented as Architecture Decision Records.

Companion Crates

Benchmarks

Run with cargo bench. See docs/performance/ for detailed results.

Operation Throughput Notes
Node insertion ~3.4M nodes/sec At 1K batch, single-threaded
Label scan <1 us 100-node label groups
1-hop traversal ~22 us MATCH-WHERE-RETURN pattern
Cypher parse <8 us Multi-hop patterns with aggregation

Documentation

Testing

1814 unit tests, integration tests via Python scripts, and 8 domain-specific example demos.

cargo test                     # Run all tests
cargo bench                    # Run benchmarks
cargo clippy -- -D warnings    # Lint
cargo fmt -- --check           # Format check

License

Apache License 2.0

Reviews (0)

No results found