tracebase

mcp
Security Audit
Fail
Health Pass
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 20 GitHub stars
Code Fail
  • eval() — Dynamic code execution via eval() in bench-results/gate-0.7.0-rc.7.json
  • eval() — Dynamic code execution via eval() in bench-results/gate-0.7.0.json
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

The memory runtime for AI agent platforms. Atomic writes, in-narrative time, first-class deletion - so agent intelligence compounds at the company level, not the model.

README.md

TraceBase

Memory layer for coding agents. Your agents stop re-solving the same problem.

MIT License
npm
tracebase.ink

 1st run  ─  "Fix the CORS error in Express"   ─►  agent solves from scratch     ─►  trace stored
 2nd run  ─  "Access-Control header missing"   ─►  prior trace surfaces as hint  ─►  faster, cheaper
 3rd run  ─  same class of problem             ─►  resolved in one shot          ─►  tokens saved

Agents are stateless. They forget everything between sessions, so the same bug gets re-derived from scratch, the same file gets re-read, the same loop spins again — every run, on your tokens. TraceBase keeps the resolved work and feeds it back into the next run.


What it catches

Five failure modes agents hit at runtime — one runtime, five arms:

Arm What it does
Recall Surfaces past solutions when a similar problem returns. Vector + heuristic match against the project-scoped pattern DB.
Gist Recalls what a file means without re-reading the bytes. Survives window compaction on long sessions.
Loop Catches doom-loops mid-run on a six-turn window and suggests a redirect. Never overrides agent judgement.
Guard Spots redundant fetches and repeat searches before they compound on the bill. Tool-call dedup window.
Fold Folds older turns into gist summaries so 100+ turn horizons stay coherent without thrashing.

Numbers

SWE-bench Verified (mini-swe-agent v2.2.8 · Claude Sonnet 4.6 · Docker, 20 attempted tasks):

Baseline With TraceBase Δ
Accuracy 62% 75% +13 pp
Cost / run (avg) −34%
Steps / run (avg) −17%
Regressions 0

Best single task — astropy-14309: 31 steps → 13 steps (−58% steps, −64% cost). Full whitepaper at tracebase.ink/whitepaper.


Install

One command. Auto-detects Claude Code / Cursor / Codex, writes the adapter, registers MCP, and initializes the local pattern DB.

npx tracebase-ai init

Verify the install:

npx tracebase-ai doctor      # integrity check — exit 0 on healthy
npx tracebase-ai status      # one-screen snapshot
npx tracebase-ai savings     # what was actually saved (since 7d)

status prints something like this:

TraceBase  workspace 6c27c71a…

  project:  /Users/you/repo
  storage:  /Users/you/repo/.tracebase/memory.db  (1.33 MB)
  cloud:    s1z-3q (https://tracebase.ink)

Agents (2 wired up):
  Claude Code  claude mcp registry (local)  ok · CLAUDE.md ok
               hooks  UserPromptSubmit ok · Stop ok · PreCompact ok
  Cursor       ~/.cursor/mcp.json ok · AGENTS.md ok

Blocks (active / candidate / demoted / merged / retired):
  5 / 0 / 0 / 0 / 0

Events (total 291):
  retrieval 24 · injection 10 · agent_used 0 · outcome 5

Integrations

Surface Adapter
Claude Code 2.x .mcp.json + managed CLAUDE.md block
Cursor ~/.cursor/mcp.json + AGENTS.md
Codex CLI codex MCP registry + AGENTS.md
OpenAI SDK wrapOpenAI(client, layer) middleware
Anthropic SDK wrapAnthropic(client, layer) middleware
Generic (LangChain, LangGraph, Agent SDK) wrapGeneric(...)
HTTP service boundary npx tracebase-ai serve --port 3781

SDK example — wrap the client, every call is now optimized:

import OpenAI from "openai";
import { ReasoningLayer, wrapOpenAI } from "tracebase-ai";

const layer = new ReasoningLayer();
const openai = wrapOpenAI(new OpenAI(), layer, {
  minScore: 0.72,      // only high-confidence matches
  skipExactMatch: true // don't inject on exact re-asks
});

// recall → inject (if match) → call → store
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Fix the CORS error in our Express API" }],
});

Streaming, tool calls, prompt caching — all supported. See tracebase.ink/docs.


Where it earns its keep

Four shapes of agent work where memory compounds value run-over-run:

  • Coding agents — Claude Code, Cursor, Codex. Pattern DB compounds across PRs and migrations.
  • Long-horizon runs — 100+ turn sessions stay coherent. Older turns fold into gists — no window thrashing.
  • Document & research — Gist remembers what long PDFs and reports mean. Past extractions surface on revisit.
  • Customer & support ops — Same-shape tickets, same playbook. Past resolutions surface before re-derivation.

Detailed walkthroughs and benchmarks at tracebase.ink.


Built for teams running agents at scale

Tracebase is the memory primitive teams deploy like infrastructure. The runtime is the same self-hosted binary whether one developer runs it locally or a platform team deploys it across many hosts. The features that matter at scale:

  • Atomic writes — concurrent agent runs don't trample each other's patterns.
  • Audit trail — every retrieval, injection, agent_used, and outcome event is logged to a local event log.
  • Deletion + rollback — patterns that disprove themselves get demoted by the lifecycle repair loop; you can also delete and rollback explicitly.
  • On-prem deployment — local SQLite, MIT license. Same binary on-prem and in CI. Code never leaves your perimeter.
  • Holdout-based proof of lift — enable with init --holdout-rate 0.1 to verify with A/B baseline that memory actually improves outcomes, not just that it fires.
# turn on the impact measurement holdout — 10% of runs go without memory
# so you can compare cohorts cleanly
npx tracebase-ai init --holdout-rate 0.1
npx tracebase-ai impact     # 30-day funnel: injected vs used vs resolved

CLI

npx tracebase-ai init                      # initialize / re-detect adapters
npx tracebase-ai status                    # one-screen install snapshot
npx tracebase-ai doctor                    # integrity check (CI-friendly)
npx tracebase-ai events --limit 20         # recent events from the local log
npx tracebase-ai impact                    # 30-day reuse + saved-tokens funnel
npx tracebase-ai savings                   # value-first summary (7d default)
npx tracebase-ai recall "<problem shape>"  # surface matching past solutions
npx tracebase-ai search "<query>"          # full-text search across the store
npx tracebase-ai remove                    # uninstall: drop store + adapters
npx tracebase-ai serve [--mcp] [--port]    # boot MCP or HTTP server manually

Every command supports --json for machine-readable output. Wire doctor --json into CI for rollout gates.


HTTP API

npx tracebase-ai serve --port 3781

curl -X POST localhost:3781/recall   -d '{"problem": "CORS error Express"}'
curl -X POST localhost:3781/store    -d '{"problem": {...}, "solution": {...}}'
curl -X POST localhost:3781/feedback -d '{"traceId": "...", "helpful": true}'
curl localhost:3781/health

Host Parity

Every cell in this table is backed by an integration test in tests/parity/host-matrix.test.ts. Nothing here is aspirational.

Host Recall FileMem Fold PromptCache Tool Loop
Claude Code (hooks) n/a¹ ✓ preventive ✓ preventive
wrapAnthropic ◐ post-hoc² ◐ post-hoc²
wrapOpenAI ◐ post-hoc² ◐ post-hoc²
wrapAgent (string→string) n/a³ ◐ post-hoc² ◐ post-hoc²
wrapGeneric (LangChain) n/a³ ◐ post-hoc² ◐ post-hoc²
wrapGeneric (LangGraph) n/a³ ◐ post-hoc² ◐ post-hoc²
wrapGeneric (Agent SDK) n/a³ ◐ post-hoc² ◐ post-hoc²

Legend: end-to-end via the host's real path · capability exists but observed AFTER the call · preventive decided BEFORE the tool runs (block / warn / allow) · post-hoc loop redirect surfaces on the NEXT turn.

Footnotes: ¹ Claude Code's prompt cache is provider-side. ² Bare wrappers don't intercept tool dispatch — wire runtime.observeToolBatch(...) to enable Tool/Loop on the next call. ³ Generic wrappers don't see provider request shapes. Cache savings, when supported, may reduce billed/processed prefix tokens — TraceBase never estimates cache savings, only what the provider reports back (cache_read_input_tokens / prompt_tokens_details.cached_tokens).


How it works (in one paragraph)

The pattern DB is project-scoped SQLite. Retrieval is a two-stage rank: fingerprint + FTS5/BM25 narrow the candidate set; structural similarity, Jaccard, and (optional) cosine embeddings re-rank. Above the threshold, the resolved trace gets injected into the prompt as context — never as a directive. After the run, an outcome event closes the loop: was the injected pattern actually used? Did the run resolve? Patterns that stop earning their keep get demoted automatically (Wilson interval lower bound on the helpfulness rate). Signal weights aren't hardcoded — they update via Thompson sampling on the outcome stream.

Full architecture and the SWE-bench whitepaper at tracebase.ink/whitepaper.


Status & roadmap

  • Self-hosted is live, MIT, in production today. v0.8.0 on npm.
  • Hosted dashboard at tracebase.ink — read-only view over the same local event log. Optional, never required.
  • Hobby ($15/mo), Startup ($159/mo), Enterprise paid tiers — draft packaging, not on checkout yet. Talk to us for early access at tracebase.ink/#pricing.

Links

Part of the Daytona Startup Grid.

Reviews (0)

No results found