Structural Memory Protocol (SMP)

Give AI agents a programmer's brain — not text retrieval, but structural understanding.

SMP is a codebase intelligence server that models source code as a live, multi-dimensional knowledge graph. While traditional RAG treats code as flat text — leading to context overflow, stale hallucinations, and broken architectural awareness — SMP builds a structural model that AI agents can navigate, reason over, and safely mutate, even in codebases exceeding 100,000 lines.

Why SMP
Key Features
Architecture Overview
How It Works
Quickstart
- Docker Compose
- Manual Installation
Protocol Reference
Agent Integration
MCP Integration
Technology Stack
Project Structure
Contributing

Why SMP

Standard RAG pipelines fail at code for three core reasons:

Problem	What breaks	SMP's answer
Context overflow	100k-line repos exceed any LLM window	Community-routed retrieval targets ~200 nodes, not the full graph
No structural awareness	Functions renamed, moved, or deleted invisibly	Live graph updated on every file change via watcher or git hook
Hallucinated dependencies	Flat-text models guess call chains	Namespaced static + eBPF runtime linker resolves exact edges

SMP replaces guessing with a graph where every node is a real code entity (function, class, file, interface) and every edge is a verified relationship (CALLS, IMPORTS, INHERITS, TESTS). Agents query the structure, not the text.

Key Features

AI-First Architecture — Purpose-built to prevent agents from breaking on large codebases. Every response includes a pre-computed structural summary so agents read metadata first and drill into raw data only when needed.

MCP Native — Fully supports the Model Context Protocol, making SMP a plug-in memory layer for any MCP-compatible AI IDE or agent framework.

Community-Routed Graph RAG — A hybrid pipeline: ChromaDB seeds discovery by vector similarity, then Neo4j performs structural N-hop traversal from those seeds. Retrieval is scoped to the relevant architectural cluster, not the entire codebase.

Hybrid Linking — Combines static AST analysis (Tree-sitter) with kernel-level runtime execution traces (eBPF) to resolve dynamic dependencies — dependency injection, metaprogramming, runtime dispatchers — that static analysis alone can never see.

Two-Level Community Detection — Louvain partitioning at coarse (architecture) and fine (routing) resolutions. Agents can query domain boundaries and coupling weights between modules.

Blast Radius Analysis — Quantify the exact set of nodes affected by a change before a single line is edited. Impact analysis runs on the graph in milliseconds.

Merkle-Indexed Sync — SHA-256 Merkle tree over all file nodes. Incremental sync is O(log n) — only diverging subtrees are re-indexed. Snapshots are cryptographically signed for secure distribution to new agent instances.

Agent Safety Layer — Sessions with MVCC conflict detection, guard checks, dry-run impact preview, checkpoints, audit log, and per-node locking. Agents cannot accidentally overwrite concurrent work.

Sandbox Runtime — Ephemeral microVM or Docker containers with Copy-on-Write filesystems, hard egress firewall, and eBPF trace capture. Safe execution for test runs, runtime edge resolution, and mutation testing.

No LLM at Query Time — Embeddings are generated once at index time. All retrieval, ranking, and response assembly are graph operations and arithmetic. No generative model is invoked during a query.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                     CODEBASE (Files + Git)                      │
└──────────────────────────┬──────────────────────────────────────┘
                           │ Updates (Watch / Agent Push / commit_sha)
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                   MEMORY SERVER (SMP Core)                      │
│  ┌─────────────┐   ┌──────────────┐   ┌─────────────┐           │
│  │   PARSER    │──▶│ GRAPH BUILDER│──▶│  ENRICHER   │           │
│  │ (Tree-sitter│   │  + LINKER    │   │  (Static    │           │
│  │    AST)     │   │(Static+eBPF) │   │  Metadata)  │           │
│  └─────────────┘   └──────────────┘   └──────┬──────┘           │
│                                              │                  │
│  ┌───────────────────────────────────────────▼──────────────┐   │
│  │                    MEMORY STORE                          │   │
│  │  ┌──────────────────────────────────────────────┐        │   │
│  │  │  GRAPH DB (Neo4j)                            │        │   │
│  │  │  Structure · CALLS_STATIC · CALLS_RUNTIME    │        │   │
│  │  │  PageRank · Sessions · Audit · BM25 Index    │        │   │
│  │  └──────────────────────────────────────────────┘        │   │
│  │  ┌──────────────────────────────────────────────┐        │   │
│  │  │  VECTOR INDEX (ChromaDB)                     │        │   │
│  │  │  code_embedding per node (index-time only)   │        │   │
│  │  └──────────────────────────────────────────────┘        │   │
│  │  ┌──────────────────────────────────────────────┐        │   │
│  │  │  MERKLE INDEX                                │        │   │
│  │  │  SHA-256 per file · Package subtree hashes   │        │   │
│  │  │  Root hash = full codebase state             │        │   │
│  │  └──────────────────────────────────────────────┘        │   │
│  └──────────────────────────────┬───────────────────────────┘   │
└─────────────────────────────────┼───────────────────────────────┘
                                  │
          ┌───────────────────────┼───────────────────────┐
          ▼                       ▼                       ▼
┌─────────────────┐   ┌──────────────────────┐   ┌───────────────┐
│  QUERY ENGINE   │   │   SANDBOX RUNTIME    │   │  SWARM LAYER  │
│  Navigator      │   │  Ephemeral microVM / │   │  Peer Review  │
│  Reasoner       │   │  Docker + CoW fork   │   │  PR Handoff   │
│  SeedWalkEngine │   │  eBPF trace capture  │   └───────┬───────┘
│  Telemetry      │   │  Egress-firewalled   │           │
└────────┬────────┘   └──────────┬───────────┘           │
         └──────────────┬────────┘               ────────┘
                        │ SMP Protocol (Dispatcher)
                        ▼
        ┌─────────────────────────────────────────────┐
        │              AGENT LAYER                    │
        │   Agent A       Agent B       Agent C       │
        │   (Coder)       (Reviewer)    (Architect)   │
        └─────────────────────────────────────────────┘

How It Works

1. Parser — AST Extraction

Technology: Tree-sitter (multi-language, fast, incremental)

Tree-sitter parses every source file into a typed Abstract Syntax Tree. The parser extracts functions, classes, variables, interfaces, imports, and exports — producing a structured document for the Graph Builder to consume.

Extracted per file:

{
    "file_path": "src/auth/login.ts",
    "language": "typescript",
    "nodes": [
        {
            "id": "func_authenticate_user",
            "type": "function_declaration",
            "name": "authenticateUser",
            "start_line": 15,
            "end_line": 42,
            "signature": "authenticateUser(email: string, password: string): Promise<Token>",
            "docstring": "Validates user credentials and returns JWT...",
            "modifiers": ["async", "export"]
        },
        {
            "id": "class_AuthService",
            "type": "class_declaration",
            "name": "AuthService",
            "methods": ["login", "logout", "refresh"],
            "properties": ["tokenExpiry", "secretKey"]
        }
    ],
    "imports": [
        {"from": "./utils/crypto", "items": ["hashPassword", "compareHash"]},
        {"from": "../db/user",     "items": ["UserModel"]}
    ],
    "exports": ["authenticateUser", "AuthService"]
}

2. Graph Builder — Structural Analysis

The Graph Builder transforms AST output into a property graph stored in Neo4j. Every code entity becomes a node; every structural dependency becomes a typed, directed edge.

Node types:

Node	Represents
`Repository`	Root node for the entire codebase
`Package`	Directory or module
`File`	Source file
`Class`	Class definition
`Function`	Function or method
`Variable`	Variable or constant
`Interface`	Type definition or interface
`Test`	Test file or test function
`Config`	Configuration file
`Community`	Louvain-detected structural cluster

Relationship types:

Relationship	Meaning
`CONTAINS`	Parent-child (Package → File)
`IMPORTS`	File imports File / Module
`DEFINES`	File defines Class / Function
`CALLS`	Function calls Function (namespaced)
`INHERITS`	Class inherits Class
`IMPLEMENTS`	Class implements Interface
`DEPENDS_ON`	General dependency
`TESTS`	Test covers Function / Class
`USES`	Function uses Variable / Type
`REFERENCES`	Variable references Variable
`MEMBER_OF`	Node belongs to Community
`BRIDGES`	Community connects to Community

3. Linker — Namespaced Cross-File Resolution

The Linker runs after the Graph Builder and resolves every CALLS edge using each file's imports list as a namespace map. This prevents the classic ambiguity problem where the same function name exists in multiple files.

Problem it solves:

File A calls: save()
File B has:   save()   (src/db/user.ts)
File C has:   save()   (src/cache/session.ts)

Without namespacing, a linker guesses. SMP's Linker traces the import to the exact origin file first:

For each CALLS(caller → "save") edge:
  1. Look up caller's IMPORTS list
  2. Find the import entry that exposes "save"
     → e.g. import { save } from "../db/user"
  3. Resolve "../db/user" to absolute path → src/db/user.ts
  4. Find node with name="save" AND file="src/db/user.ts"
  5. Draw CALLS edge to that exact node

  If step 2 finds no import for "save":
  → Mark edge as CALLS_UNRESOLVED (reason="not in imports")
  → Flag for smp/linker/report

Every CALLS edge carries a resolved flag so agents always know whether a dependency is confirmed or ambiguous. Unresolved edges are reportable via smp/linker/report.

4. Runtime Linker — eBPF Execution Traces

Static linking resolves what the source code says will be called. The Runtime Linker resolves what actually runs — capturing real call chains from inside a sandbox via eBPF, then injecting CALLS_RUNTIME edges into the graph.

What static linking cannot see:

// Dependency Injection — static linker sees no CALLS edge here
container.bind<IAuthService>("AuthService").to(JwtAuthService);

// Metaprogramming — target function name is a runtime variable
const method = config.get("handler");
this[method](payload);

How runtime linking works:

Agent spawns sandbox (smp/sandbox/spawn)
        │
        ▼
Agent runs test suite inside sandbox (smp/sandbox/execute, inject_ebpf: true)
        │
        ▼
eBPF daemon intercepts every function entry/exit at kernel level
        │
        ▼
SMP Runtime Linker processes trace → resolves targets → injects CALLS_RUNTIME edges
        │
        ▼
Graph DB now has a full hybrid call graph:
  CALLS_STATIC  = "source says this will be called"   (resolved at index time)
  CALLS_RUNTIME = "kernel confirmed this was called"   (resolved at execution time)

The result is a hybrid call graph that handles dependency injection, event buses, metaprogramming, plugin systems, and any other pattern that defeats static analysis.

5. Enricher — Static Metadata

The Enricher attaches human-readable metadata to structural nodes using only what already exists in the code: docstrings, inline comments, decorators, and type annotations. No LLM. No embeddings. Pure static extraction.

At index time, code_embedding is generated once per node from signature + docstring and stored in ChromaDB. This embedding is used exclusively for the seed phase of smp/locate. No generative model is invoked at query time.

Enriched node schema (final):

{
    "id": "func_authenticate_user",
    "semantic": {
        "status": "enriched",
        "docstring": "Validates user credentials and returns a signed JWT.",
        "inline_comments": [
            {"line": 18, "text": "compare against bcrypt hash, not plaintext"}
        ],
        "decorators": ["@requires_db", "@rate_limited"],
        "annotations": {
            "params": {"email": "string", "password": "string"},
            "returns": "Promise<Token>",
            "throws": ["AuthenticationError", "DatabaseError"]
        },
        "tags": ["auth", "jwt", "session"],
        "source_hash": "a3f9c12d",
        "enriched_at": "2025-02-15T10:30:00Z"
    },
    "vector": {
        "code_embedding": [0.021, -0.134, 0.087, "..."],
        "embedding_input": "authenticateUser(email: string, password: string): Promise<Token> — Validates user credentials and returns a signed JWT.",
        "model": "text-embedding-3-small",
        "indexed_at": "2025-02-15T10:30:01Z"
    }
}

6. Community Detection — Architectural Clustering

Purpose: Automatically partition the codebase graph into structural clusters at two levels so agents can reason about domain boundaries and smp/locate can narrow its seed search to ~200 nodes instead of all 100k+.

Two-level hierarchy:

Level 0 — COARSE (global architecture view)
  e.g. "backend_core", "api_gateway", "data_layer"
  → Used by architecture agents to understand module ownership.
  → smp/community/boundaries shows coupling strength between modules.

Level 1 — FINE (search routing)
  e.g. "auth_core", "auth_oauth", "payments_stripe"
  → Subdivisions of coarse communities.
  → Used by smp/locate Phase 0 to scope seed search.
  → Every node carries both community_id_l0 and community_id_l1.

Algorithm: Louvain partitioning via Neo4j GDS at two resolutions (0.5 = coarse, 1.5 = fine), run over CALLS_STATIC, CALLS_RUNTIME, and IMPORTS edges. Labels are derived purely from topology — majority path prefix and top tags across member nodes. No LLM.

7. SeedWalkEngine — Community-Routed Graph RAG

smp/locate is SMP's primary feature discovery endpoint. It runs a four-phase pipeline:

Phase 0 — Community Routing
  Query vector compared to community centroid embeddings
  → Identify the 1-2 most relevant fine communities
  → Scope seed search to ~200 nodes in those communities

Phase 1 — Seed (ChromaDB)
  Run vector similarity search within scoped nodes
  → Return top-k seed nodes by cosine similarity

Phase 2 — Walk (Neo4j)
  Single Cypher query — no N+1 problem
  → N-hop traversal over CALLS_STATIC, CALLS_RUNTIME, IMPORTS, DEFINES
  → Captures structural neighbourhood of each seed

Phase 3 — Rank (Composite Score)
  final_score = α·vector_score + β·pagerank_norm + γ·heat_norm
  → PageRank reflects structural importance in the full graph
  → Heat score reflects how frequently the node has been accessed

Phase 4 — Structural Map
  Build adjacency list of edges between result nodes
  → Agents receive a renderable call chain, not just a flat list

8. Agent Safety Layer

SMP provides a full safety harness for agents operating in write mode:

Sessions — Every write operation must open a session declaring its scope and intent. Sessions are persisted in Neo4j with MVCC (multi-version concurrency control) for read sessions and exclusive locks for write sessions.

Guard Checks (smp/guard/check) — Pre-flight check before any write. Returns blocked, warning, or clear based on concurrent session conflicts, hot-node status (heat score > 90), lock status, and test coverage gaps.

Dry Run (smp/dryrun) — Proposes a change and receives a full impact preview: breaking vs. non-breaking verdict, list of affected callers, missing tests, and structural diff — before touching disk.

Checkpoints (smp/checkpoint) — Snapshot the current graph state for a set of files before writing. Enables rollback if a change produces unexpected results.

Audit Log — Every session, guard check, dry run, checkpoint, and write is recorded in Neo4j with timestamp and agent ID. Queryable via smp/audit/log.

9. Sandbox Runtime

Every sandbox is an ephemeral, isolated execution environment:

Docker or Firecracker microVM — hard process isolation
Copy-on-Write filesystem — changes never persist to the host
Hard egress firewall — no network access by default; only whitelisted internal endpoints allowed
eBPF trace capture — kernel-level call interception for runtime edge resolution
Testcontainers — spin up local Postgres, Redis, or other services per sandbox run

Sandboxes are used for: running test suites to capture runtime edges, integrity verification (AST data-flow checks + mutation testing), and safe execution of agent-proposed code before committing.

Quickstart

Docker Compose (Fastest)

Requirements: Docker, Docker Compose

git clone https://github.com/your-org/smp.git
cd smp
cp .env.example .env        # Edit with your Neo4j password
docker compose up -d
curl http://localhost:8420/health
# → {"status":"ok"}

Manual Installation

Requirements: Python 3.11, Neo4j 5.x

# 1. Clone and configure
git clone https://github.com/your-org/smp.git
cd smp
cp .env.example .env

# 2. Set up Python environment
python3.11 -m venv .venv
source .venv/bin/activate     # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

# 3. Start the server
smp serve --port 8420

# 4. Ingest your project
smp ingest /path/to/your/project

# 5. Run a query
smp query "Where is the authentication logic handled?"

Environment variables (.env):

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password
CHROMA_HOST=localhost
CHROMA_PORT=8000
SMP_PORT=8420
OPENAI_API_KEY=sk-...   # Used for code_embedding generation at index time only

Protocol Reference

SMP uses JSON-RPC 2.0 over stdio, HTTP, or WebSocket. Every method follows the same envelope:

{
    "jsonrpc": "2.0",
    "method": "smp/<method>",
    "params": { ... },
    "id": 1
}

Memory Management

`smp/update` — Sync a single file change

{
    "jsonrpc": "2.0",
    "method": "smp/update",
    "params": {
        "file_path": "src/auth/login.ts",
        "content": "...",
        "change_type": "modified"   // "modified" | "created" | "deleted"
    },
    "id": 1
}

Response:

{
    "result": {
        "status": "success",
        "nodes_added": 3,
        "nodes_updated": 12,
        "nodes_removed": 1,
        "relationships_updated": 8
    }
}

`smp/batch_update` — Sync multiple files atomically

{
    "method": "smp/batch_update",
    "params": {
        "changes": [
            {"file_path": "src/auth/login.ts",      "content": "...", "change_type": "modified"},
            {"file_path": "src/auth/middleware.ts",  "content": "...", "change_type": "created"}
        ]
    }
}

`smp/sync` — Merkle-diff sync (O(log n))

Sends client root hash + per-file SHA-256 hashes. Server compares against its Merkle tree and returns exactly which files need to be pushed or pulled.

{
    "method": "smp/sync",
    "params": {
        "client_root_hash": "e3b0c44298fc",
        "file_hashes": {
            "src/auth/login.ts":     "a3f9c12d",
            "src/utils/crypto.ts":   "c3a1f004"
        }
    }
}

`smp/index/export` — Export signed index snapshot

{
    "method": "smp/index/export",
    "params": {
        "scope": "full",
        "signing_key_id": "key_prod_01"
    }
}

`smp/index/import` — Import and verify a signed snapshot

{
    "method": "smp/index/import",
    "params": {
        "snapshot_id": "snap_4f8a2c",
        "source_url": "smp://snapshots/snap_4f8a2c.tar.zst",
        "expected_root_hash": "f7c2a19b3d84",
        "verify_signature": true
    }
}

Structural Queries

`smp/navigate` — Find an entity and its relationships

{
    "method": "smp/navigate",
    "params": {
        "query": "authenticateUser",
        "include_relationships": true
    }
}

`smp/trace` — Follow a relationship chain

{
    "method": "smp/trace",
    "params": {
        "start": "func_authenticate_user",
        "relationship": "CALLS",
        "depth": 3,
        "direction": "outgoing"
    }
}

`smp/flow` — Trace data flow through the graph

{
    "method": "smp/flow",
    "params": {
        "entry": "func_authenticate_user",
        "direction": "out",
        "depth": 4
    }
}

`smp/diff` — Structural diff between two commit SHAs

{
    "method": "smp/diff",
    "params": {
        "from_sha": "abc1234",
        "to_sha": "def5678",
        "scope": "package:src/auth"
    }
}

`smp/why` — Explain why two nodes are connected

{
    "method": "smp/why",
    "params": {
        "from": "func_authenticate_user",
        "to": "class_UserModel"
    }
}

Context & Impact

`smp/context` — Get the programmer's mental model for a file

Returns a pre-computed structural summary (role, blast radius, risk level, test coverage, heat score) plus raw graph data: imports, importers, defined symbols, structurally similar files, entry points, and data flow.

{
    "method": "smp/context",
    "params": {
        "file_path": "src/auth/login.ts",
        "scope": "edit"   // "edit" | "create" | "debug" | "review"
    }
}

Summary fields in the response:

Field	Description
`role`	Topology-derived: `endpoint`, `service`, `core_utility`, `test`, `config`, `isolated`, `module`
`blast_radius`	Number of files that import this file
`api_layer_callers`	Callers originating from the API layer
`avg_complexity`	Average cyclomatic complexity of defined functions
`max_complexity`	Highest complexity function in the file
`has_tests`	Whether test coverage exists
`is_hot_node`	True if heat score > 90
`heat_score`	Frequency of recent access (0–100)
`risk_level`	`high` / `medium` / `low` — derived from blast_radius and complexity

`smp/impact` — Blast radius of a proposed change

{
    "method": "smp/impact",
    "params": {
        "entity": "func_authenticate_user",
        "change_type": "signature_change"   // "signature_change" | "delete" | "move"
    }
}

`smp/locate` — Community-routed feature discovery

{
    "method": "smp/locate",
    "params": {
        "query": "user registration flow",
        "seed_k": 3,
        "hops": 2,
        "top_k": 10
    }
}

Returns ranked results with final_score, vector_score, pagerank, heat_score, and a structural_map adjacency list of edges between result nodes.

Community Queries

`smp/community/detect` — Run Louvain at two resolutions

{
    "method": "smp/community/detect",
    "params": {
        "algorithm": "louvain",
        "relationship_types": ["CALLS_STATIC", "CALLS_RUNTIME", "IMPORTS"],
        "levels": [
            {"level": 0, "resolution": 0.5, "label": "coarse"},
            {"level": 1, "resolution": 1.5, "label": "fine"}
        ],
        "min_community_size": 5
    }
}

`smp/community/list` — List all communities

{"method": "smp/community/list", "params": {"level": 1}}

`smp/community/get` — Get members and bridge edges of a community

{
    "method": "smp/community/get",
    "params": {
        "community_id": "comm_auth_core",
        "node_types": ["Function", "Class"],
        "include_bridges": true
    }
}

`smp/community/boundaries` — Coupling strength between all community pairs

{
    "method": "smp/community/boundaries",
    "params": {"level": 0, "min_coupling": 0.05}
}

Returns coupling weights and the specific bridge nodes responsible for cross-domain dependencies.

Enrichment & Search

`smp/enrich` — Extract static metadata from a node

{"method": "smp/enrich", "params": {"node_id": "func_authenticate_user", "force": false}}

Skips silently if source_hash is unchanged since last enrichment.

`smp/enrich/batch` — Enrich an entire scope

{"method": "smp/enrich/batch", "params": {"scope": "package:src/auth", "force": false}}

`smp/enrich/stale` — List nodes whose source changed since last enrichment

{"method": "smp/enrich/stale", "params": {"scope": "full"}}

`smp/enrich/status` — Enrichment coverage report

Returns total_nodes, has_docstring, has_annotations, has_tags, no_metadata, stale, and coverage_pct.

`smp/annotate` — Manually set metadata on a node

Used for no_metadata nodes that have nothing extractable from the AST.

{
    "method": "smp/annotate",
    "params": {
        "node_id": "func_xT9_handler",
        "description": "Processes Stripe webhook payload and updates subscription status.",
        "tags": ["billing", "webhook", "stripe"]
    }
}

`smp/tag` — Bulk-tag nodes by scope

{
    "method": "smp/tag",
    "params": {
        "scope": "package:src/payments",
        "tags": ["billing", "stripe", "pci-sensitive"],
        "action": "add"   // "add" | "remove" | "replace"
    }
}

`smp/search` — BM25 full-text search across enriched metadata

Backed by a Neo4j Full-Text Index (BM25). Scales to 100k+ nodes with no table scans.

{
    "method": "smp/search",
    "params": {
        "query": "stripe webhook",
        "match": "all",
        "filter": {
            "node_types": ["Function", "Class"],
            "tags": ["billing"],
            "scope": "package:src/payments"
        },
        "top_k": 5
    }
}

Agent Safety

`smp/session/open` — Open a write session

{
    "method": "smp/session/open",
    "params": {
        "agent_id": "agent_coder_01",
        "task": "Refactor authentication middleware",
        "scope": ["src/auth/login.ts", "src/auth/middleware.ts"],
        "mode": "write"   // "read" | "write"
    }
}

`smp/guard/check` — Pre-flight safety check

{
    "method": "smp/guard/check",
    "params": {
        "session_id": "sess_abc123",
        "target": "src/auth/login.ts"
    }
}

Returns verdict: clear, warning, or blocked along with reasons and recommended actions.

`smp/dryrun` — Preview impact of a proposed change

{
    "method": "smp/dryrun",
    "params": {
        "session_id": "sess_abc123",
        "file_path": "src/auth/login.ts",
        "proposed_content": "..."
    }
}

Returns verdict: safe or breaking, with the list of affected nodes, missing tests, and a structural diff.

`smp/checkpoint` — Snapshot graph state before writing

{
    "method": "smp/checkpoint",
    "params": {
        "session_id": "sess_abc123",
        "files": ["src/auth/login.ts"]
    }
}

`smp/session/close` — Close a session

{
    "method": "smp/session/close",
    "params": {"session_id": "sess_abc123", "status": "completed"}
}

`smp/audit/log` — Query the audit log

{
    "method": "smp/audit/log",
    "params": {
        "agent_id": "agent_coder_01",
        "since": "2025-02-15T00:00:00Z",
        "event_types": ["session_open", "dryrun", "write"]
    }
}

Sandbox

`smp/sandbox/spawn` — Create an ephemeral sandbox

{
    "method": "smp/sandbox/spawn",
    "params": {
        "runtime": "docker",   // "docker" | "firecracker"
        "image": "node:20-alpine",
        "workspace": "src/auth",
        "inject_ebpf": true
    }
}

`smp/sandbox/execute` — Run a command inside the sandbox

{
    "method": "smp/sandbox/execute",
    "params": {
        "sandbox_id": "box_99x",
        "command": "npm test -- src/auth",
        "capture_traces": true
    }
}

`smp/sandbox/destroy` — Tear down a sandbox

{"method": "smp/sandbox/destroy", "params": {"sandbox_id": "box_99x"}}

Swarm Handoff

`smp/handoff/review` — Hand off a change to a peer-review agent

{
    "method": "smp/handoff/review",
    "params": {
        "session_id": "sess_abc123",
        "reviewer_agent": "agent_reviewer_01",
        "notes": "Refactored token expiry handling."
    }
}

`smp/handoff/pr` — Generate a structured PR with structural diff

Returns a PR package containing: changed files, structural diff, new runtime edges discovered during sandbox execution, mutation test score, and guard check history.

Agent Integration

Python SDK

import asyncio
from smp.client import SMPClient

async def main():
    async with SMPClient("http://localhost:8420") as client:

        # Feature discovery
        results = await client.locate("user registration flow")

        # Impact analysis
        impact = await client.assess_impact("src/auth/manager.py::authenticate")
        print(f"Change affects {impact['total_affected_nodes']} nodes")

        # Get editing context
        context = await client.get_context("src/auth/login.ts", scope="edit")
        print(f"Risk level: {context['summary']['risk_level']}")
        print(f"Blast radius: {context['summary']['blast_radius']} files")

asyncio.run(main())

TypeScript SDK

import { SMPClient } from "@smp/client";

const client = new SMPClient("http://localhost:8420");

// Locate a feature
const results = await client.locate("payment webhook handler");

// Assess impact before editing
const impact = await client.impact("func_process_payment", "signature_change");
console.log(`Affects ${impact.total_affected_nodes} nodes`);

Full Agent Workflow

This is the recommended pattern for any agent performing a write operation:

class CodingAgent:
    def __init__(self, smp_client):
        self.smp = smp_client

    def edit_file(self, file_path: str, instruction: str, new_code: str):
        # 1. Open a session — declare scope and intent upfront
        session = self.smp.call("smp/session/open", {
            "agent_id": self.agent_id,
            "task": instruction,
            "scope": [file_path],
            "mode": "write"
        })

        # 2. Pre-flight guard check — abort immediately if blocked
        guard = self.smp.call("smp/guard/check", {
            "session_id": session["session_id"],
            "target": file_path
        })
        if guard["verdict"] == "blocked":
            raise AbortError(guard["reasons"])

        # 3. Get full structural context — agents read summary first
        context = self.smp.call("smp/context", {
            "file_path": file_path,
            "scope": "edit"
        })

        # 4. Dry run — preview impact before touching disk
        dryrun = self.smp.call("smp/dryrun", {
            "session_id": session["session_id"],
            "file_path": file_path,
            "proposed_content": new_code,
        })
        if dryrun["verdict"] == "breaking":
            raise AbortError(dryrun["risks"])

        # 5. Checkpoint → write → sync memory
        self.smp.call("smp/checkpoint", {
            "session_id": session["session_id"],
            "files": [file_path]
        })
        write_to_disk(file_path, new_code)
        self.smp.call("smp/update", {
            "file_path": file_path,
            "content": new_code,
            "change_type": "modified"
        })

        # 6. Close session
        self.smp.call("smp/session/close", {
            "session_id": session["session_id"],
            "status": "completed"
        })

MCP Integration

SMP is a native MCP server. Add it to your agent's MCP configuration to expose all SMP methods as tools:

{
    "mcpServers": {
        "smp": {
            "url": "http://localhost:8420/mcp",
            "transport": "http"
        }
    }
}

Once connected, your MCP-compatible IDE or agent (Cursor, Claude Code, Windsurf, etc.) will have access to all smp/* methods as first-class tools, with full structural memory for every code change.

Technology Stack

Component	Technology	Rationale
AST Parsing	Tree-sitter	Multi-language, incremental, fast — no LLM
Graph DB	Neo4j 5.x	CALLS, IMPORTS, PageRank, BM25 full-text, community detection via GDS
Vector Index	ChromaDB	High-speed seed discovery at query time
Merkle Index	SHA-256 (in-process)	O(log n) incremental sync, secure snapshot distribution
Community Detection	Louvain (Neo4j GDS)	Topology-only, no LLM, reproducible
Runtime Tracing	eBPF (BCC / libbpf)	Kernel-level call capture — zero app instrumentation
Sandbox Runtime	Docker / Firecracker microVMs	Ephemeral, CoW filesystem, hard egress firewall
Container Topology	Testcontainers	Per-sandbox Postgres, Redis, etc.
Mutation Testing	Stryker (JS/TS) / mutmut (Python)	Deterministic, no LLM, anti-gamification
Data Models	msgspec	Zero-copy, schema-validated structs
Protocol	JSON-RPC 2.0	Standard, simple, MCP-compatible
Embeddings	text-embedding-3-small (index time only)	Generated once per node; never at query time
Language	Python 3.11 (prototype) → Rust (production)	Start fast, optimize later

Project Structure

structural-memory/
├── server/
│   ├── core/
│   │   ├── parser.py            # AST extraction (Tree-sitter)
│   │   ├── graph_builder.py     # Build structural graph
│   │   ├── linker.py            # Static namespaced CALLS resolution
│   │   ├── linker_runtime.py    # eBPF trace ingestion → CALLS_RUNTIME edges
│   │   ├── enricher.py          # Static metadata extraction
│   │   ├── merkle.py            # Merkle tree builder + hash comparator
│   │   ├── index_distributor.py # Index export / import + signature verification
│   │   ├── community.py         # Louvain detection + MEMBER_OF writes
│   │   ├── telemetry.py         # Hot node tracking + heat scores
│   │   ├── store.py             # Graph DB interface + full-text index + PageRank
│   │   └── chroma_index.py      # ChromaDB collection management
│   ├── engine/
│   │   ├── navigator.py         # Graph traversal (navigate, trace, flow, why)
│   │   ├── reasoner.py          # Proactive context + summary computation
│   │   ├── seed_walk.py         # SeedWalkEngine: Seed & Walk pipeline
│   │   └── guard.py             # Guard checks, dry run, test-gap analysis
│   ├── sandbox/
│   │   ├── spawner.py           # Docker / Firecracker microVM lifecycle
│   │   ├── executor.py          # Command runner + stdout/stderr capture
│   │   ├── ebpf_collector.py    # eBPF daemon interface + trace → graph edges
│   │   ├── network_policy.py    # Egress firewall rules
│   │   └── verifier.py          # AST data-flow check + mutation test runner
│   ├── protocol/
│   │   ├── dispatcher.py        # @rpc_method decorator + method registry
│   │   └── handlers/
│   │       ├── memory.py        # smp/update, batch_update, sync, merkle/*
│   │       ├── index.py         # smp/index/export, import
│   │       ├── community.py     # smp/community/detect, list, get, boundaries
│   │       ├── query.py         # smp/navigate, trace, context, impact, locate, flow, diff, why
│   │       ├── enrichment.py    # smp/enrich, annotate, tag, search
│   │       ├── safety.py        # smp/session/*, guard/check, dryrun, checkpoint, lock, audit
│   │       ├── planning.py      # smp/plan, conflict
│   │       ├── sandbox.py       # smp/sandbox/spawn, execute, destroy
│   │       ├── verify.py        # smp/verify/integrity
│   │       ├── handoff.py       # smp/handoff/review, pr
│   │       └── telemetry.py     # smp/telemetry/*
│   └── main.py                  # Server entry point + full-text index init
├── clients/
│   ├── python_client.py         # Python SDK for agents
│   ├── typescript_client.ts     # TypeScript SDK for agents
│   └── cli.py                   # Manual interaction + debugging
├── watchers/
│   ├── file_watcher.py          # Watch for filesystem changes
│   └── git_hook.py              # Git-based incremental updates
└── tests/
    └── ...

Protocol dispatcher pattern — each method group lives in its own handler module with a @rpc_method decorator. No god-file if/elif chains.

# protocol/dispatcher.py
def rpc_method(name: str):
    def decorator(fn):
        _registry[name] = fn
        return fn
    return decorator

def dispatch(method: str, params: dict, context: ServerContext):
    handler = _registry.get(method)
    if not handler:
        raise MethodNotFound(method)
    return handler(params, context)

# protocol/handlers/query.py
@rpc_method("smp/navigate")
def handle_navigate(params, ctx):
    return ctx.engine.navigator.navigate(
        params["query"], params.get("include_relationships", False)
    )

@rpc_method("smp/locate")
def handle_locate(params, ctx):
    return ctx.engine.seed_walk.locate(
        params["query"], params.get("seed_k", 3), params.get("hops", 2), params.get("top_k", 10)
    )

Component Summary

Component	Purpose
Parser	Extract AST from source (Tree-sitter)
Graph Builder	Create structural nodes and relationships
Static Linker	Namespace-aware cross-file CALLS resolution
Runtime Linker	eBPF execution traces → `CALLS_RUNTIME` edges
Enricher	Attach docstrings, annotations, tags, `code_embedding`
Graph DB	Neo4j — structure, PageRank, sessions, telemetry, BM25
Vector Index	ChromaDB — `code_embedding` per node for seed phase
Merkle Index	SHA-256 tree — O(log n) incremental sync + secure distribution
SeedWalkEngine	`smp/locate` pipeline: vector seed → N-hop walk → composite rank
Query Engine	navigate, trace, context, impact, locate, flow, diff, why
SMP Protocol	JSON-RPC 2.0 via Dispatcher — handlers split by domain
Agent Safety	Sessions, guard checks, dry runs, checkpoints, audit log
Telemetry	Hot node tracking, heat scores, automatic safety escalation
Community Detection	Two-level Louvain — Graph RAG routing + architecture queries
Sandbox Runtime	Ephemeral microVM/Docker, CoW filesystem, egress firewall
Integrity Gate	AST data-flow check + deterministic mutation testing
Swarm Handoff	Peer review pass-off + structured PR with structural diff

Contributing

See CONTRIBUTING.md for setup instructions, coding standards, and how to add new protocol methods or language parsers.

Documentation

Architecture Guide — Deep dive into the Graph RAG pipeline and storage layer.
API Reference — Full JSON-RPC 2.0 method specification with all parameters and response schemas.
User Guide — Tutorials and advanced agent workflows.
Contributing — How to extend SMP with new parsers, methods, and integrations.

SMP — giving AI agents the structural memory to master any codebase.

Structural Memory Protocol (SMP)

Table of Contents

Why SMP

Key Features

Architecture Overview

How It Works

1. Parser — AST Extraction

2. Graph Builder — Structural Analysis

3. Linker — Namespaced Cross-File Resolution

4. Runtime Linker — eBPF Execution Traces

5. Enricher — Static Metadata

6. Community Detection — Architectural Clustering

7. SeedWalkEngine — Community-Routed Graph RAG

8. Agent Safety Layer

9. Sandbox Runtime

Quickstart

Docker Compose (Fastest)

Manual Installation

Protocol Reference

Memory Management

smp/update — Sync a single file change

smp/batch_update — Sync multiple files atomically

smp/sync — Merkle-diff sync (O(log n))

smp/index/export — Export signed index snapshot

smp/index/import — Import and verify a signed snapshot

Structural Queries

smp/navigate — Find an entity and its relationships

smp/trace — Follow a relationship chain

smp/flow — Trace data flow through the graph

smp/diff — Structural diff between two commit SHAs

smp/why — Explain why two nodes are connected

Context & Impact

smp/context — Get the programmer's mental model for a file

smp/impact — Blast radius of a proposed change

smp/locate — Community-routed feature discovery

Community Queries

smp/community/detect — Run Louvain at two resolutions

smp/community/list — List all communities

smp/community/get — Get members and bridge edges of a community

smp/community/boundaries — Coupling strength between all community pairs