GrayMatter

License

Three lines of code to give your AI agents persistent memory.
Single Go binary. Zero infra. Works with Claude Code, or any tool that calls the Anthropic Messages API.

Free, offline, no account required.

go get github.com/angelnicolasc/graymatter

mem := graymatter.New(".graymatter")
mem.Remember("agent", "user prefers bullet points, hates long intros")
ctx := mem.Recall("agent", "how should I format this response?")
// ["user prefers bullet points, hates long intros"]

Why

Every AI agent today is stateless by default. Every run starts from zero.

Mem0, Zep, Supermemory solve this — but in Python or TypeScript, and they
require a server. Go has zero production-ready, embeddable, zero-deps
memory layer for agents. That gap is GrayMatter.

~90% token reduction at 100 sessions versus full-history injection.
No Docker. No Redis. No Python. No API key required for storage.

Install

Binary (recommended):

# macOS (Apple Silicon)
curl -sSL -o graymatter.tar.gz https://github.com/angelnicolasc/graymatter/releases/download/v0.2.1/graymatter_0.2.1_darwin_arm64.tar.gz
tar -xzf graymatter.tar.gz
sudo mv graymatter /usr/local/bin/

# Windows (PowerShell)
iwr https://github.com/angelnicolasc/graymatter/releases/download/v0.2.1/graymatter_0.2.1_windows_amd64.zip -OutFile graymatter.zip
Expand-Archive graymatter.zip -DestinationPath .\graymatter_cli

Go install:

go install github.com/angelnicolasc/graymatter/cmd/graymatter@latest

Library:

go get github.com/angelnicolasc/graymatter

Library usage

Three functions. That's the entire API surface.

import "github.com/angelnicolasc/graymatter"

// Open (or create) a memory store in the given directory.
mem := graymatter.New(".graymatter")
defer mem.Close()

// Store an observation.
mem.Remember("sales-closer", "Maria didn't reply Wednesday. Third touchpoint due Friday.")

// Retrieve relevant context for a query.
ctx := mem.Recall("sales-closer", "follow up Maria")
// ctx is a []string ready to inject into a system prompt:
// ["Maria didn't reply Wednesday. Third touchpoint due Friday."]

Every method has a context-aware variant that respects deadlines and cancellation signals end-to-end — no wrappers needed:

ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()

if err := mem.RememberCtx(ctx, "agent", "observation"); err != nil { ... }
results, err := mem.RecallCtx(ctx, "agent", "query")

Full agent pattern

mem := graymatter.New(project.Root + "/.graymatter")
defer mem.Close()

// 1. Recall before calling the LLM.
memCtx, _ := mem.Recall(skill.Name, task.Description)

messages := []anthropic.MessageParam{
    {Role: "system", Content: skill.Identity + "\n\n## Memory\n" + strings.Join(memCtx, "\n")},
    {Role: "user",   Content: task.Description},
}

// 2. Call your LLM.
response, _ := client.Messages.New(ctx, anthropic.MessageNewParams{...})

// 3. Remember after the run.
mem.Remember(skill.Name, extractKeyFacts(response))

Config

mem, err := graymatter.NewWithConfig(graymatter.Config{
    DataDir:          ".graymatter",
    TopK:             8,
    EmbeddingMode:    graymatter.EmbeddingAuto,  // Ollama → OpenAI → Anthropic → keyword
    OllamaURL:        "http://localhost:11434",
    OllamaModel:      "nomic-embed-text",
    AnthropicAPIKey:  os.Getenv("ANTHROPIC_API_KEY"),
    OpenAIAPIKey:     os.Getenv("OPENAI_API_KEY"),
    DecayHalfLife:    30 * 24 * time.Hour,        // 30 days
    AsyncConsolidate: true,
})

CLI

graymatter init                                    # create .graymatter/ + .mcp.json
graymatter remember "agent" "text to remember"    # store a fact
graymatter remember --shared "text"               # store in shared namespace (all agents)
graymatter recall   "agent" "query"               # print context
graymatter recall   --all "agent" "query"         # merge agent + shared memory
graymatter checkpoint list    "agent"             # show saved checkpoints
graymatter checkpoint resume  "agent"             # print latest checkpoint as JSON
graymatter mcp serve                              # start MCP server (Claude Code / Cursor)
graymatter mcp serve --http :8080                 # HTTP transport
graymatter export --format obsidian --out ~/vault # dump to Obsidian vault
graymatter tui                                    # 4-view terminal UI
graymatter run agent.md [--background]            # run a SKILL.md agent file
graymatter sessions list                          # list managed agent sessions
graymatter plugin install manifest.json           # install a plugin
graymatter server --addr :8080                    # REST API server

Global flags: --dir (data dir), --quiet, --json

Observability

The REST server (graymatter server) exposes a /metrics endpoint powered by Go's standard expvar package — zero extra dependencies.

GET /metrics

{
  "requests_total":     {"remember": 120, "recall": 340, "healthz": 5},
  "request_latency_us": {"remember": 4200, "recall": 1800},
  "facts_total":        {"stored": 120},
  "recall_total":       {"served": 340}
}

For library users, memory.StoreConfig exposes hooks for APM integration:

store, err := memory.Open(memory.StoreConfig{
    DataDir:       ".graymatter",
    DecayHalfLife: 30 * 24 * time.Hour,

    // Called after every Recall with agent ID, query, result count, and latency.
    OnRecall: func(agentID, query string, n int, d time.Duration) {
        metrics.RecordHistogram("graymatter.recall.latency", d.Seconds())
    },

    // Called after every successful Put with agent ID, fact ID, and latency.
    OnPut: func(agentID, factID string, d time.Duration) {
        metrics.Increment("graymatter.facts.stored")
    },

    // Routes internal log events to any standard logger.
    Logger: slog.NewLogLogger(slog.Default().Handler(), slog.LevelDebug),

    // Swap the vector backend entirely — bring your own Qdrant, pgvector, etc.
    VectorBackend: myQdrantAdapter,
})

Claude Code / Cursor (MCP)

graymatter init     # creates .mcp.json automatically

Claude Code detects .mcp.json automatically. Five tools become available:

Tool	What it does
`memory_search`	Recall facts for a query
`memory_add`	Store a new fact
`checkpoint_save`	Snapshot current session
`checkpoint_resume`	Restore last checkpoint
`memory_reflect`	Add / update / forget / link memories (agent self-edit)

Or add manually to your project's .mcp.json:

{
  "mcpServers": {
    "graymatter": {
      "command": "graymatter",
      "args": ["mcp", "serve"]
    }
  }
}

Storage

Layer	Tech	What it holds
KV store	bbolt (pure Go, ACID)	Sessions, checkpoints, facts, metadata, KG
Vector index	chromem-go (pure Go)	Semantic embeddings, hybrid retrieval
Export	Markdown files	Human-readable, git-friendly, Obsidian-compatible

Single file: ~/.graymatter/gray.db
Single folder: .graymatter/vectors/

No migrations. No schema versions. Append-only with decay-based eviction.

Embeddings

GrayMatter degrades gracefully. It works without any embedding model.

Mode	When
Ollama (default)	Machine has Ollama running with `nomic-embed-text`
OpenAI	`OPENAI_API_KEY` set, Ollama not available
Anthropic	`ANTHROPIC_API_KEY` set, Ollama and OpenAI not available
Keyword-only	No embedding available — TF-IDF + recency, zero deps

Auto-detection order in EmbeddingAuto mode: Ollama → OpenAI → Anthropic → keyword.

# Pull the embedding model once (Ollama):
ollama pull nomic-embed-text

# Or set an API key (OpenAI or Anthropic):
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

Memory lifecycle

Recall(agent, task)          ← hybrid: vector + keyword + recency → top-8 facts
    ↓
Inject into system prompt    ← your 3 lines of code
    ↓
Agent runs
    ↓
Remember(agent, observation) ← store key facts during/after run
    ↓
Consolidate() [async]        ← summarise + decay + prune (LLM optional)

Consolidation is the only "smart" step. Everything else is deterministic.
Without consolidation, GrayMatter still works — it just doesn't compress over time.

Consolidation auto-enables when ANTHROPIC_API_KEY is set. To use Ollama:

cfg := graymatter.DefaultConfig()
cfg.ConsolidateLLM = "ollama"

Token efficiency

Numbers produced by go run ./benchmarks/token_count — real Recall calls,
keyword embedder, no LLM required:

Sessions	Full injection	GrayMatter	Reduction
1	~80 tokens	~80 tokens	0%
10	~630 tokens	~550 tokens	12%
30	~1,880 tokens	~550 tokens	71%
100	~6,960 tokens	~670 tokens	90%

Each "session" = one paragraph-length agent observation (~60 words).
GrayMatter always injects only the top-8 most relevant observations for the query.
With vector embeddings the recall precision improves, maintaining similar reduction ratios.

Reproduce locally:

go run ./benchmarks/token_count

Build from source

git clone https://github.com/angelnicolasc/graymatter
cd graymatter
CGO_ENABLED=0 go build -ldflags="-s -w -X main.version=dev" -o graymatter ./cmd/graymatter

Output: single static binary, ~10 MB, no runtime dependencies.

Testing

The full test suite requires no LLM and no network — every test uses
t.TempDir() with a keyword embedder or injected stubs. Runs clean on
Linux, macOS, and Windows in CI.

# Core library
go test -count=1 -timeout=120s ./pkg/memory/...

# CLI / server / plugins
cd cmd/graymatter && go test -count=1 -timeout=120s ./internal/...

Package	Tests	What's covered
`pkg/memory`	42 unit tests + 3 fuzz targets	Store lifecycle, hybrid recall, RRF fusion, decay math, semaphore, concurrent writes, vector paths, dimension guard
`internal/harness`	21	Agent file parsing, retry/backoff, session recovery
`internal/kg`	21	Graph CRUD, entity extraction, weight decay, Obsidian export
`internal/server`	11	All REST endpoints, concurrent remember/recall, cancelled-context requests
`internal/plugin`	10	Install, list, remove, E2E echo plugin binary

Fuzz targets (pkg/memory): FuzzTokenize, FuzzUnmarshalFact, FuzzKeywordScore — each with a seeded corpus so they run deterministically in CI and can be extended with go test -fuzz.

Core library coverage: 73.5% (CI gate: ≥ 70%). Measured without mocks — real bbolt + chromem-go instances in a temp directory.

Token-reduction benchmark (also zero deps):

go run ./benchmarks/token_count

What GrayMatter is NOT

Not a framework. Not an agent runner. Not a replacement for your existing tooling.
Not a hosted service. Not a SaaS. Not a cloud product.
Not a knowledge base UI. Not Notion. Not Obsidian.
Not trying to win the enterprise memory market.

It is exactly one thing: the missing stateful layer for Go CLI agents,
packaged as a library you import in two lines.

Roadmap

Library: Remember / Recall / Consolidate
bbolt + chromem-go storage
Ollama + OpenAI + Anthropic + keyword-only embedding
Hybrid retrieval (vector + keyword + recency, RRF fusion)
CLI: init remember recall checkpoint export run sessions plugin server
MCP server (Claude Code / Cursor) + memory_reflect self-edit tool
Knowledge graph (entity extraction, node/edge linking, Obsidian export)
Shared memory across agents (--shared, --all flags, __shared__ namespace)
REST API server mode (graymatter server --addr :8080)
Plugin system (JSON line protocol, graymatter plugin install/list/remove)
4-view Bubble Tea TUI (Memory / Sessions / Knowledge Graph / Stats)
Context-propagation API (RememberCtx, RecallCtx, RecallAllCtx, …)
Pluggable VectorStore interface (swap chromem-go for Qdrant, pgvector, etc.)
expvar /metrics endpoint — zero-dep, stdlib-only observability
OnRecall / OnPut / Logger hooks for APM integration
Embedding dimension guard — warns on provider switch instead of silent corruption
go.work workspace — core library imports zero TUI/CLI dependencies
Three-platform CI (Linux, macOS, Windows) + 73.5% coverage gate
Fuzz testing: FuzzTokenize, FuzzUnmarshalFact, FuzzKeywordScore
Ollama-backed consolidation LLM (Ollama as summariser, not just embedder)
WebSocket streaming for REST API

GrayMatter — v0.2.1 — April 2026