toolrecall
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Fail
- rm -rf — Recursive force deletion command in scripts/uninstall.py
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
High-performance local SQLite FTS5-based tool output cache and MCP multiplexer for AI agents.
ToolRecall — Deterministic Tool Cache for LLM Agents
ToolRecall sits between your agent and the OS (or your API provider). On repeat calls it serves cached results from local SQLite instead of re-executing system commands or re-sending requests to the LLM. Caching is deterministic — byte-identical until mtime/TTL expiry — which qualifies every API call for provider prefix-caching discounts (up to 90% at Anthropic/OpenAI).
1 tick instead of 4: A file read normally needs stat → open → read → close. ToolRecall needs only stat (mtime check) — on cache hit the bytes come from memory, bypassing disk entirely.
Zero pip dependencies. Python 3.11+ stdlib only. 76 KB install. One daemon.
pip install toolrecall # Installs nothing but ToolRecall itself
toolrecall init # Interactive security setup (default-deny paths)
toolrecall daemon & # Start cache daemon
Two ways to use (both on by default — no extra command needed):
| Path | What it does | How to connect | Default |
|---|---|---|---|
| Forward proxy | Intercepts HTTP requests to API providers (OpenAI, Anthropic, etc.) — caches full responses by body hash. Zero tokens consumed on cache hit. | export OPENAI_BASE_URL=http://localhost:8569 — or set any SDK's base URL |
✅ On (:8569) |
| MCP bridge | Caches tool output (file reads, terminal commands) — agent connects as an MCP client. Server names auto-resolve from registry. | Add to ~/.claude/.mcp.json or run toolrecall mcp |
✅ On (stdio) |
Requirements: Python 3.11+ (sqlite3, tomllib, json, http.server, urllib from stdlib).
What It Does
ToolRecall intercepts tool calls at the daemon level and returns cached results when inputs haven't changed:
| Mechanism | What gets cached | Invalidation | Token saving |
|---|---|---|---|
| File cache | First disk read per file | mtime changes → fresh read |
Smaller context → provider prefix-cache discounts |
| Terminal cache | Static commands (hostname, whoami, pwd, uname, uptime, df, free, crontab) | TTL-based (default 300s) | Same output never re-sent to LLM |
| MCP cache | External MCP server responses (GitHub, time, fetch…) | TTL-based (default 60s, per-server override) | Repeated tool results served from local cache |
| Script/Code cache | cached_run, cached_exec output |
ttl=0 disables caching |
Same as file cache |
| Forward proxy | Full API responses (chat completions to OpenAI, Anthropic, DeepSeek…) | Body hash — same request → same response | Zero tokens consumed — cache hit never reaches the provider |
| Context Tracker | Tracks dirty/clean files via checkpoints | In-memory (resets on daemon restart) | 93.8% O(n²) reduction — drop clean files from context |
Dynamic commands (git, ls, curl) and state-changing operations always execute live.
Measured effect
In a 13-hour session (Hermes + Gemini 3.1 Pro, 386 messages, 13 project files):
- 89% hit rate (91% file cache): 827 tool calls served from SQLite instead of OS
- 73% fewer file-read tokens at 3× re-read (~204K → ~55K unique)
- ~81% fewer at 10× re-read (~630K → ~55K unique)
- ~20 min less wait time — each cache hit avoids ~1.5s subprocess fork
- Provider prefix-caching becomes reliable: byte-identical payloads qualify for Anthropic/OpenAI's up-to-90% discount on every call
Source: Benchmark
Architecture
[ Claude Code ] [ Cursor IDE ] [ Hermes Agent ] [ Any LLM Client ]
\\ | | /
\\ | | /
\\ | | /
+──────────────────────────────────────────────────────────+
│ Standard stdio MCP OR HTTP (OPENAI_BASE_URL proxy) │
+──────────────────────────────────────────────────────────+
│ Unix Domain Socket (Linux/Mac)
│ TCP localhost:8568 (Windows)
+────────────────▼──────────────────────────────────+
│ ToolRecall Daemon │
│ ┌─────────────────────────────┐ │
│ │ In-Memory LRU (Cache) │ │
│ └──────────────┬──────────────┘ │
│ ┌──────────────▼──────────────┐ │
│ │ SQLite WAL (Persistent) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ MCP Server Multiplexer │ │
│ └──────────────┬──────────────┘ │
+─────────────────┼──────────────────+
│ Lazy-Loaded stdio Subprocesses
+─────────────────▼──────────────────+
│ [ Downstream MCP: GitHub / Time ] │
+────────────────────────────────────+
The daemon holds everything: the hybrid in-memory LRU + SQLite WAL cache, the MCP Multiplexer (manages subprocesses for external MCP servers), the Forward Proxy (caches full API responses via body hash), and the Security Gate (path allowlist, sensitive file blocklist, cognitive scan).
All agents share one daemon via either:
- MCP Bridge (
toolrecall mcp) — the agent connects as an MCP client and usescached_read,cached_terminaletc. - Forward proxy (auto-started on
:8569) — the agent's API calls go tolocalhost:8569instead ofapi.anthropic.com. The proxy hashes the request body, checks the cache, and on a hit returns the cached response without ever contacting the provider.
See Architecture for the full design.
MCP Multiplexer
When running multiple agents on the same machine (5 Claude Code sessions + 3 Cursor instances), each one normally spawns its own subprocess for every MCP server (GitHub, Postgres, time…). That's 10× the RAM for the same tool.
The daemon's multiplexer shares one subprocess per server across all agents:
- Lazy loading: servers boot on first call, not at daemon start (~0.01s vs ~1.7s per server)
- Idle timeout: inactive subprocesses killed after 15 min (configurable)
- Failure isolation: one server crash doesn't affect others (auto-reconnect, max 3 attempts)
- Secrets: API tokens loaded from
~/.toolrecall/.env, never exposed to the LLM - Auto-resolution: Server names auto-resolve from the built-in registry — no
command/argsneeded for common servers
All agents connect to one MCP server in their config: toolrecall mcp.
Quick Config Example
# ~/.toolrecall/config.toml
[mcp_multiplex]
servers = ["time", "github", "fetch"]
# ↑ auto-resolved: time=builtin, github=builtin, fetch=uvx
No [mcp_multiplex.servers_config] section needed for known servers. Custom servers still use the explicit config.
Built-in Servers (zero deps)
| Server | What it does |
|---|---|
time |
Current time in any timezone — stdlib only |
github |
GitHub API (create repo, push files, list commits) — urllib only |
sequential-thinking |
Reasoning validation, contradiction detection — no network |
fetch |
Fetch URLs — stdlib only (urllib.request), 500KB configurable limit via TOOLRECALL_FETCH_MAX_BYTES |
External Servers (needs uvx)
| Server | Package |
|---|---|
filesystem |
mcp-server-filesystem — safe file access |
git |
mcp-server-git — Git operations |
memory |
mcp-server-memory — knowledge graph |
brave-search |
@anthropic/mcp-server-brave-search — web search |
playwright |
@playwright/mcp — browser automation |
slack |
mcp-server-slack — Slack workspace |
See MCP Multiplexer for full configuration details.
When to use: You run 3+ agents simultaneously on the same machine and they share the same MCP tools.
When to skip: Single agent setup — each agent manages its own MCP servers fine.
Security
ToolRecall doesn't prevent prompt injection — it cages the consequences:
- Default-deny path allowlist: Without config, NO paths are readable.
toolrecall initprompts for paths interactively. - Sensitive file blocklist:
.env,.ssh/,.pem,.aws/, etc. are blocked even inside allowed paths. allow_terminal=false(default): drops allcached_terminalcalls into a void.os.path.realpath(): catches../../../etc/shadowtraversal before OS is touched.- Cognitive Pre-Fight: Deterministic regex scan on MCP tool arguments for override instructions, jailbreak tags, exfiltration URLs. Zero LLM, ~0.001ms hot path.
- AST injection check: Parses tool arguments as Python AST — blocks
exec(),eval(),__import__()calls. - Daemon IPC via UDS: No open ports, immune to SSRF.
See Security Architecture for the full trust boundary.
Quick Reference — CLI
toolrecall init Create default config.toml and .env [required once]
toolrecall daemon Start cache daemon (also starts MCP + forward proxy) [required]
toolrecall mcp Start MCP Bridge (or: mcp list to see registry) [connect any MCP agent]
toolrecall serve Forward proxy (cache API responses) [auto-started with daemon; use for custom port]
toolrecall debug Start debug/demo server (test cached_read/term via curl)
toolrecall status Cache status and stats [optional]
toolrecall stats Detailed cache statistics (JSON) [optional]
toolrecall invalidate Clear all caches [optional]
toolrecall reset-stats Reset statistics counters [optional]
toolrecall nginx Generate nginx config [optional]
toolrecall index Build/update FTS5 knowledge database [optional]
toolrecall index-memory Index agent memory stores (MEMORY.md, USER.md) [optional]
toolrecall index-dir Index a directory (e.g. Obsidian) [optional]
toolrecall config-set Set a config value [optional]
toolrecall shim Install/uninstall OS-level cache shim (.pth file) [optional]
Agent Integration
Forward proxy (API-level caching)
Cache API responses before they leave your machine. The forward proxy starts automatically with the daemon — no extra command needed. Works with any OpenAI-compatible provider (OpenAI, Anthropic, DeepSeek, OpenRouter, etc.).
toolrecall daemon & # also starts forward proxy on :8569
export OPENAI_BASE_URL=http://localhost:8569/v1 # Any OpenAI-compatible SDK
# or override the base URL in your provider config / client init
| Provider SDK | How to connect | Token savings |
|---|---|---|
| Any OpenAI-compatible client | export OPENAI_BASE_URL=http://localhost:8569/v1 |
Zero tokens consumed — cache hit never reaches the provider |
| Custom port | toolrecall serve --port 9090 if you need a different port |
same |
MCP Bridge (tool-level caching)
ToolRecall registers MCP tools like cached_read, cached_terminal, cached_write, cached_patch. Connect any MCP agent by adding one server:
{
"mcpServers": {
"toolrecall": {
"command": "toolrecall",
"args": ["mcp"]
}
}
}
This single snippet works for Claude Desktop, Claude Code, Cursor, Cline, Windsurf, Continue, and any MCP-compatible agent with zero per-agent variations.
| Agent | How to connect | Token savings |
|---|---|---|
| Any MCP agent | Add the toolrecall server to your MCP config (see above) |
✅ Universal |
| Hermes | Set [hermes] transparent_cache = "transparent" in ~/.toolrecall/config.toml |
✅ Zero config |
| Shim (agent-agnostic) | toolrecall shim --install patches open()/subprocess.run() at the OS level |
✅ Works with any agent binary |
Configuration
TOML (stdlib tomllib) or YAML (optional, requires pyyaml).
# ~/.toolrecall/config.toml (minimal config — toolrecall init creates a full one)
[mcp]
allowed_paths = ["/home/user/projects"] # Add your project dirs — default-deny!
allow_terminal = false
default_ttl = 60
[mcp_multiplex]
enabled = true
# Server names auto-resolve: time/github/seqthink/fetch = builtin (no deps),
# filesystem/git/memory = external (needs uvx), or override via [mcp_multiplex.servers_config]
servers = ["time", "github", "fetch"]
[nginx]
# nginx is OPTIONAL — only needed if you want HTTPS/SSL in front of the proxy.
# site_name = "toolrecall"
# domain = "example.com"
# ssl = false
TOOLRECALL_* environment variables override TOML.
Uninstall
pip uninstall toolrecall
python3 scripts/uninstall.py --force
Removes: daemon, systemd service, config, cache DB, logs.
Platform Support
| Platform | Transport | Status |
|---|---|---|
| Linux | Unix Domain Sockets | ✅ Tested in CI |
| macOS | Unix Domain Sockets | ✅ Should work (POSIX). Not in CI. |
| Windows | TCP localhost:8568 fallback | ⚠️ Core + transport tested. CLI works. |
Documentation
- Architecture — daemon design, layers, IPC
- Architecture Diagram — system and sequence diagrams, token costs, Context Tracker
- CLI Reference — all subcommands explained
- Configuration Reference — config.toml, config.py, all env vars
- Context Tracker — checkpoint-based dirty-file tracking, O(n²) breakdown
- How It Works — quick technical overview
- MCP Multiplexer — single-daemon MCP management, server registry
- Testing Guide — test philosophy, organization, per-file coverage
- Benchmark — measured performance, token savings
- Knowledge DB — FTS5 indexing guide
- Docker Deployment — containerized stack
- Security Architecture — WAF details, trust boundary
- Troubleshooting — common fixes
- Appendix — comparison tables, OSI model, ROI, vision, audit
- Hermes Transparent Cache — auto-patching for Hermes Agent
[^notall]: Not all agents tested yet — please report bugs.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found