cerefox
Personal knowledge base with hybrid search and read/write access for AI agents
Cerefox
User-owned shared memory for AI agents. A persistent, curated knowledge layer that multiple AI tools can read and write, backed by Postgres + pgvector.
What is Cerefox?
Cerefox is a user-owned knowledge memory layer: a persistent, curated knowledge base that sits between you and the AI tools you use.
The primary use case is shared memory across AI agents: knowledge written by one tool (Claude, ChatGPT, Cursor, or a custom agent) becomes immediately available to all others. This prevents context fragmentation, so the same information doesn't have to be re-explained in every session.
Cerefox is asynchronous shared memory, not a message bus. It solves the persistent context problem: knowledge written in one context is findable in any other. A user curates project documents and an AI agent discovers them through search without being told they exist. An agent writes a decision during a coding session and a different agent, on a different machine, running a different model, finds it days later. A user switches from one AI tool to another and the accumulated knowledge carries over without manual transfer. The boundaries that Cerefox dissolves are between agents, between sessions, between human and machine, and across time.
For the full project vision, principles, and roadmap direction, see
docs/research/vision.md.
- Agent-first, not human-first: AI agents are first-class citizens on both sides: they read and write; humans curate and validate
- Own your data: everything lives in a Postgres database you control (Supabase free tier or self-hosted)
- Cross-agent coordination: agents on separate machines and runtimes coordinate through persistent shared context (see
docs/guides/agent-coordination.md) - Not a note-taking app: Cerefox is knowledge infrastructure, not a replacement for Obsidian, Notion, or Bear; those tools handle authoring, Cerefox handles indexing and agent access
- Hybrid search: full-text + semantic search finds relevant knowledge even with fuzzy or conceptual queries
- Any agent, anywhere: remote MCP via Supabase Edge Functions; ChatGPT via Custom GPT + GPT Actions
- Keep it cheap: Supabase free tier + low-cost cloud embeddings; see
docs/guides/operational-cost.md
Features
| Feature | Details |
|---|---|
| Hybrid search | Combines full-text (BM25) + semantic (vector) search with a configurable alpha weight |
| Metadata-filtered search | JSONB containment filter (@>) on document metadata; server-side, GIN-indexed; composable with project filter and all search modes; available across all access paths (MCP, CLI, web UI, GPT Actions) |
| Metadata search | Standalone metadata-only search (no text query needed); find documents by key-value criteria, project, and date range; optional content inclusion with byte budget; dedicated MCP tool, CLI command, and web UI page |
| Project discovery | cerefox_list_projects MCP tool for agents to discover available projects; all search results include human-readable project_names alongside UUIDs |
| Heading-aware chunking | Greedy section accumulation — H1/H2/H3 sections accumulate until MAX_CHUNK_CHARS; heading breadcrumb preserved per chunk |
| Cloud embeddings | OpenAI text-embedding-3-small (768-dim) via API — or swap to Fireworks AI |
| Remote MCP endpoint | cerefox-mcp Supabase Edge Function — MCP Streamable HTTP; connect Claude Desktop, Claude Code, or Cursor with just a URL and anon key; no Python install needed |
| Local MCP server | cerefox mcp stdio server -- local alternative with zero Edge Function usage, lower latency, and offline support; requires Python + uv + local clone |
| Web UI | React + TypeScript SPA (Mantine UI) at /app/; FastAPI JSON API backend; Markdown viewer, search with 4 modes, document editing, project management |
| Multi-format ingest | .md, .txt, .pdf (pypdf), .docx (python-docx) |
| Batch ingest | cerefox ingest-dir recurses directories |
| Deduplication | SHA-256 content hash; re-ingesting the same file is a no-op |
| Backup and restore | JSON snapshots, optional git commit |
| Small-to-big retrieval | cerefox_context_expand RPC returns chunk neighbours for richer context |
| Audit log | Immutable, append-only log of all write operations (create, update, delete, status change). Author attribution with author_type ('user' or 'agent'). Browsable via web UI, queryable via MCP tool and Edge Function |
| Review status | Schema-level review_status on documents (approved / pending_review). Auto-transitions based on author_type. Filterable on search |
| Version governance | Version archival (protect specific versions from cleanup), configurable retention (CEREFOX_VERSION_CLEANUP_ENABLED), version diff viewer |
| Usage tracking | Opt-in logging of all operations (reads and writes) across all access paths. Tracks operation type, access path (remote-mcp, local-mcp, edge-function, webapp, cli), requestor identity, query text, and result count. Controlled via cerefox config-set usage_tracking_enabled true/false -- no redeploy needed |
| Analytics dashboard | /app/analytics -- 7 interactive charts: calls per day, access path breakdown, top documents, top readers, operations donut, reader word cloud, and reader-to-document access pattern visualization (HEB). Date range + project + path filters. CSV export. |
Getting Started
Full walkthrough:
docs/guides/quickstart.md-- zero to first ingested document and connected agent in 15 minutes.Upgrading? If you are upgrading from a previous version, see the Upgrading Guide for migration steps.
1. Clone and install
git clone https://github.com/yourname/cerefox.git
cd cerefox
uv sync
2. Set up Supabase (free)
- Sign up at supabase.com — a GitHub login works fine.
- Create a new project. Give it a name (e.g.
cerefox) and set a database password (store it somewhere safe — you'll need it once). - On the project creation screen leave the defaults:
- Enable Data API ✅ — required (the Python client uses this)
- Enable automatic RLS — leave unchecked (single-user app, not needed)
3. Configure .env
cp .env.example .env
Open .env and fill in these values:
| Variable | Where to find it |
|---|---|
CEREFOX_SUPABASE_URL |
Supabase → Settings → API → Project URL |
CEREFOX_SUPABASE_KEY |
Supabase → Settings → API → Secret keys → default |
CEREFOX_DATABASE_URL |
Supabase → Settings → Database → Connection string → Session pooler (port 5432) |
OPENAI_API_KEY |
platform.openai.com/api-keys |
CEREFOX_DATABASE_URL notes:
- Use the Session pooler string (port 5432), not the Direct connection or Transaction pooler.
- The username must include your project ref:
postgres.your-project-ref— not justpostgres. - Direct connection is IPv6 only on the free tier. If you get
nodename nor servname provided, you are on IPv4 — use the Session pooler. - See
.env.examplefor both URL formats with full explanations.
4. Deploy the schema
uv run python scripts/db_deploy.py
5. Deploy the Edge Functions
Edge Functions handle server-side embedding so AI agents never need a local model. Requires the Supabase CLI.
npx supabase functions deploy cerefox-search
npx supabase functions deploy cerefox-ingest
npx supabase functions deploy cerefox-mcp
Set your OpenAI key as a Supabase secret (used by the functions at runtime):
npx supabase secrets set OPENAI_API_KEY=sk-...your-key...
6. Ingest a document and open the web UI
uv run cerefox ingest my-notes.md --title "My notes"
uv run cerefox web # → http://localhost:8000
Optional: ingest the Cerefox docs themselves so AI agents can look up project details:
# Create a "cerefox" project first, then sync README + all docs/ into it.
uv run cerefox create-project cerefox
uv run python scripts/sync_docs.py
Re-run sync_docs.py any time after updating documentation to keep the knowledge base current.
Try with sample data: the test-data/ directory contains six diverse markdown documents
you can ingest to experiment with search before adding your own content:
uv run cerefox ingest-dir test-data/ --recursive
Architecture
cerefox_documents cerefox_chunks
───────────────── ───────────────────────────────
id, title, source id, document_id, chunk_index
content_hash heading_path, heading_level
project_id content, char_count
metadata (JSONB) embedding_primary (VECTOR 768)
chunk_count fts (TSVECTOR, generated)
Search RPCs (MCP tools): cerefox_hybrid_search, cerefox_fts_search,cerefox_semantic_search, cerefox_search_docs, cerefox_reconstruct_doc,cerefox_context_expand, cerefox_save_note
Connecting AI agents
Option 1 — Remote MCP (recommended) — just a URL, an anon key, and npx:
The cerefox-mcp Supabase Edge Function speaks MCP Streamable HTTP. No Python, no local
repo clone — works from any machine with Node.js installed.
# Claude Code (native HTTP transport)
claude mcp add --transport http cerefox \
https://<project-ref>.supabase.co/functions/v1/cerefox-mcp \
--header "Authorization: Bearer <anon-key>"
For Claude Desktop, use supergateway as
a stdio-to-HTTP bridge in claude_desktop_config.json:
{
"mcpServers": {
"cerefox": {
"command": "npx",
"args": [
"-y", "supergateway",
"--streamableHttp", "https://<project-ref>.supabase.co/functions/v1/cerefox-mcp",
"--header", "Authorization: Bearer <anon-key>"
]
}
}
}
For Cursor, use url + headers.Authorization in mcp.json.
Option 2 — ChatGPT (web + desktop) via Custom GPT + GPT Actions (requires ChatGPT Plus):
Create a Custom GPT and add an Action pointing at the Supabase Edge Functions — no local
install, no MCP config, works from both ChatGPT web and desktop. Uses the Supabase anon key
as Bearer auth.
Option 3 — Local stdio MCP (legacy fallback) — requires Python + uv + local repo clone:
{
"mcpServers": {
"cerefox": {
"command": "uv",
"args": ["--directory", "/path/to/cerefox", "run", "cerefox", "mcp"]
}
}
}
Full setup for all options: docs/guides/connect-agents.md
Documentation
| Guide | Description |
|---|---|
docs/guides/quickstart.md |
Zero to first document in 15 minutes |
docs/guides/setup-supabase.md |
Supabase project setup |
docs/guides/configuration.md |
All configuration options |
docs/guides/connect-agents.md |
MCP agent integration |
docs/guides/agent-coordination.md |
Multi-agent coordination patterns and best practices |
docs/guides/response-limits.md |
Response size limits: per-path behaviour and tuning |
docs/guides/access-paths.md |
All access layers, credentials, and integration paths |
docs/guides/setup-local.md |
Local Docker setup |
docs/guides/ops-scripts.md |
Backup, restore, migrate, sync docs |
docs/guides/setup-cloud-run.md |
Google Cloud Run deployment |
docs/guides/operational-cost.md |
Cost breakdown for all deployment options |
docs/guides/upgrading.md |
Standard upgrade checklist, version-specific notes |
CONTRIBUTING.md |
How to contribute to Cerefox |
License
Apache 2.0 — see LICENSE.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found