Engram

Local-first context compression for AI coding tools. One binary saves 85-93% of redundant tokens across every LLM call.

What is Engram?

Every time an AI coding tool sends a request to an LLM, it re-sends the same context: who you are, what you're working on, your preferences, your project structure. This redundancy costs real money and eats into context windows.

Engram eliminates it. It runs locally as a lightweight daemon and compresses both your identity (CLAUDE.md, system prompts, project instructions) and your conversation context (message history, tool results, responses) across every LLM call. The result: dramatically smaller prompts, lower costs, and more room in the context window for what actually matters.

How It Works

Engram applies three compression stages:

Identity compression — Verbose CLAUDE.md prose and project instructions are reduced to compact key=value codebook entries. Definitions are sent once on the first turn; subsequent turns reference keys only.
Context compression — Conversation history is serialized using a learned codebook that strips JSON overhead from message objects (role=user content=... instead of full JSON).
Response compression — LLM responses are compressed using provider-specific codebooks tuned to Anthropic and OpenAI output patterns.

Key Numbers

Metric	Value
Identity compression	~96-98% token reduction
Context compression	~40-60% token reduction
Overall savings	85-93% per session
Startup overhead	<50ms
Memory footprint	~30MB resident

Quick Start

# Install via Homebrew (macOS/Linux)
brew install pythondatascrape/tap/engram

# Or download a release binary
curl -fsSL https://github.com/pythondatascrape/engram/releases/latest/download/engram_$(uname -s | tr A-Z a-z)_$(uname -m | sed 's/x86_64/amd64/').tar.gz | tar xz
sudo mv engram /usr/local/bin/

# Or install from source
go install github.com/pythondatascrape/engram/cmd/engram@latest

# Set up Engram for your project
cd your-project
engram install

# See what Engram found
engram analyze

# Start the compression daemon
engram serve

CLI Reference

Command	Description
`engram install`	Interactive setup — detects your tools, configures integration
`engram analyze`	Analyze your project and show compression opportunities
`engram advisor`	Show optimization recommendations based on session data
`engram serve`	Start the compression daemon
`engram status`	Show daemon status, active sessions, and savings

Every command supports --help for detailed usage.

Integrations

Engram works as a plugin for AI coding tools:

Claude Code

engram install
# Engram auto-detects Claude Code and registers as an MCP plugin

Once installed, Engram compresses context automatically — no workflow changes needed.

OpenClaw

engram install
# Engram auto-detects OpenClaw and configures the integration

SDKs

For custom integrations, Engram provides thin client SDKs in three languages. All connect to the local daemon over a Unix socket.

Python:

from engram import Engram

async with await Engram.connect() as client:
    result = await client.compress({"identity": "...", "history": [], "query": "..."})

Go:

client, _ := engram.Connect(ctx, "")
defer client.Close()
result, _ := client.Compress(ctx, map[string]any{...})

Node.js:

import { Engram } from "engram";

const client = await Engram.connect();
const result = await client.compress({identity: "...", history: [], query: "..."});

See the Integration Guide for details.

Demo

See the Travelbound demo project for a working example that shows Engram compressing a real project's context from ~4,200 tokens to ~380 tokens.

Documentation

Getting Started — Install and configure Engram for your first project
CLI Reference — Full command documentation with examples
Integration Guide — Configure Claude Code, OpenClaw, and SDK setup
Changelog — Release history

License

Apache 2.0 — see LICENSE.