ostk-cache
Health Warn
- License — License: Apache-2.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 7 GitHub stars
Code Fail
- rm -rf — Recursive force deletion command in docs/smoke-hook.sh
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Context-aware caching proxy for Anthropic and OpenAI — maximizes prompt-cache hit rates by anchoring long-lived context into stable boundaries. Drop-in for Claude Code, Codex, Cursor, and custom harnesses.
ostk-cache
Drop-in L1.5 caching proxy for Anthropic /v1/messages, with an OpenAI GPT adapter for /v1/responses. The Anthropic adapter anchors long-lived context (system prompts, tool definitions, kernel orientation) into stable byte-boundaries that hit Anthropic's prompt cache; the GPT adapter applies OpenAI prompt-cache routing knobs while preserving the provider-specific wire shape.
Works with any surface that lets you set a provider base URL — Claude Code, Codex, Cursor, custom MCP servers, internal harnesses. The proxy is transparent at the protocol layer (chunked HTTP, SSE streaming, multipart file uploads all forward verbatim where appropriate); request-body rewrites are adapter-specific.
Install
Pre-built binaries for every release — three bins per platform, no build step required:
| Platform | Proxy | Hooks installer | Stats reporter |
|---|---|---|---|
| Linux x86_64 | ostk-cache-linux-amd64 |
ostk-cache-hooks-linux-amd64 |
ostk-cache-stats-linux-amd64 |
| macOS x86_64 | ostk-cache-macos-amd64 |
ostk-cache-hooks-macos-amd64 |
ostk-cache-stats-macos-amd64 |
| macOS arm64 | ostk-cache-macos-arm64 |
ostk-cache-hooks-macos-arm64 |
ostk-cache-stats-macos-arm64 |
| Windows x86_64 | ostk-cache-windows-amd64.exe |
ostk-cache-hooks-windows-amd64.exe |
ostk-cache-stats-windows-amd64.exe |
Grab from the Releases page, chmod +x, drop on PATH. Quick install on Linux/macOS:
PLATFORM=linux-amd64 # or macos-amd64 / macos-arm64
BASE=https://github.com/os-tack/ostk-cache/releases/latest/download
curl -L "$BASE/ostk-cache-$PLATFORM" -o /usr/local/bin/ostk-cache && chmod +x /usr/local/bin/ostk-cache
curl -L "$BASE/ostk-cache-hooks-$PLATFORM" -o /usr/local/bin/ostk-cache-hooks && chmod +x /usr/local/bin/ostk-cache-hooks
curl -L "$BASE/ostk-cache-stats-$PLATFORM" -o /usr/local/bin/ostk-cache-stats && chmod +x /usr/local/bin/ostk-cache-stats
Building from source (contributors only):
git clone https://github.com/os-tack/ostk-cache && cd ostk-cache
cargo build --release --bins
# Binaries land in target/release/{ostk-cache,hooks,stats}
ostk-cache depends on three private membrane crates from os-tack/haystack (resolved via git-deps with HTTPS auth). For local development with a sibling haystack checkout, see the [patch] recipe at the bottom of Cargo.toml.
Quick start
# 1. Start the proxy
ANTHROPIC_API_KEY=sk-ant-... ostk-cache
# ostk-cache 0.3.1 listening on 127.0.0.1:8080
# provider=anthropic mode=mutate soft-cap=30MB tail=off
# rewrite=on capture=off kernel-timeout=500ms
# 2. Point your agent surface at it
export ANTHROPIC_BASE_URL=http://127.0.0.1:8080
# 3. Use the agent normally — claude, codex, cursor, custom MCP host, etc.
claude # or codex / cursor / your harness
Common operator flags (see ostk-cache --help for the full list):
ostk-cache --mode rebuild-kernel --soft-cap-mb 28 --tail-transcript
ostk-cache --provider gpt --upstream https://api.openai.com
ostk-cache --print-config # show resolved config + source attribution
Every turn appends an AmpRow to .ostk/memory/ledger.jsonl in the proxy's cwd, tagged with the active mode for later A/B partitioning. The proxy supports graceful shutdown (SIGINT/SIGTERM) — in-flight requests are drained and the server waits for active SSE streams to finish before exiting.
Modes
The Anthropic adapter has four mutation strategies, selected by --mode (orOSTK_CACHE_REBUILD / OSTK_CACHE_PASSTHROUGH env legacy, or themode = "…" key in .ostk/cache.toml). All four ledger their
accounting; only the Anthropic request-body rewrite differs. The GPT adapter
uses mode="gpt" in the ledger and applies OpenAI prompt-cache parameters to/v1/responses requests.
--mode |
Ledger tag | What it does to messages[] |
|---|---|---|
passthrough |
passthrough |
Byte-identical forward. Control baseline. |
mutate (default) |
mutate |
Collapse system to one 1h cache block; HUD prepend; strip user cache_control. |
rebuild |
rebuild_local |
Discard prior turns; replace with synthesized kernel projection (envelope + tool summary + intent thread + recent assistant turn digests). In-flight chain preserved. |
rebuild-kernel |
rebuild_kernel |
Same as rebuild but the live envelope is fetched from a running ostk kernel daemon over .ostk/ostk.sock. Falls back to rebuild_local if the kernel isn't reachable. |
Optional layer-3 add-on (combinable with any rebuild mode): --tail-transcript
(or OSTK_CACHE_TAIL_TRANSCRIPT=1) ingests cross-session activity from
the local Claude Code transcript directory and appends it to the
synthetic context.
The Makefile wires every combination as a make run-* target. See make help.
Configuration
The proxy resolves every setting from four sources, highest precedence
first:
- CLI flags (
--port,--mode,--soft-cap-mb, ...) - Environment variables (legacy escape hatch — same names as before)
- Workspace TOML at
<cwd>/.ostk/cache.toml(or--config PATH) - Built-in defaults
Run ostk-cache --print-config to dump the resolved table with the
source column showing where each value came from — invaluable when
debugging "why is my flag being ignored?":
port = 8089 (toml)
provider = anthropic (default)
mode = rebuild-kernel(cli)
soft_cap = 28MB (toml)
tail.transcript = true (env)
rewrite.enabled = true (default)
kernel.timeout_ms = 250 (toml)
A workspace .ostk/cache.toml looks like:
provider = "anthropic"
mode = "rebuild-kernel"
port = 8089
soft_cap_mb = 28
[tail]
transcript = true
limit = 75
[rewrite]
enabled = true
[kernel]
timeout_ms = 250
[capture]
http = false
# dir = ".ostk/http-capture"
Observability
Each turn emits a single compact line to stdout:
[turn s=a323be mode=rebuild_kernel req=12.4MB→0.18MB resp=14.2KB tok_in=5421 cache_r=98% drop=701/1.7MB→14KB elapsed=3.4s]
When a section gets bloated (any single section >5MiB or the post-rewrite
request >80% of the soft cap), an indented second line shows the
per-section breakdown:
└─ sys=2.1MB tools=8.4MB synthetic=14KB in_flight=1.8MB (dominant: tools)
Run with --verbose to keep the legacy multi-line per-pass output in
addition to the one-liner.
Ledger columns added: req_bytes_in, req_bytes_out, resp_bytes, system_bytes, tools_bytes, synthetic_bytes, in_flight_bytes (all Option<u64> — old rows without them deserialize as null). ostk-cache-stats aggregates them into bytes_in_total, bytes_out_total, resp_bytes_total, and bytes_reduction_ratio, providing a section-level view of where the bytes are spent across the entire session.
For wire-body investigations, run with --capture-http to write one directory per proxied request under .ostk/http-capture by default. Each capture contains request-in.body, request-out.body, response.body, and metadata.json with status, byte counts, SHA-256 hashes, route path, and redacted auth headers. This is off by default; the normal proxy path only records byte/token metadata.
GPT adapter
Run ostk-cache --provider gpt and point OpenAI-compatible clients at /v1/responses through the proxy. The GPT adapter preserves the caller's request shape, adds prompt_cache_key when absent, and adds prompt_cache_retention = "24h" for gpt-5.5 requests when absent. Usage accounting reads OpenAI usage.input_tokens_details.cached_tokens into the shared ledger so cache hit rate can be compared with Anthropic runs.
Soft cap
Anthropic enforces a 32MB hard limit on /v1/messages. To rescue
requests before they hit that wall, the proxy enforces a configurable
soft cap (--soft-cap-mb, default 30, 0 disables) with a
progressive reduction pipeline. Tiers fire in order until under cap:
- Tier A — tool-result ejection.
tool_resultcontent bodies
larger than 100KB are replaced with a[ejected: …]stub, preserving
thetool_use_idpairing. Largest-first ordering. The model can
re-run the call if it needs the data. - Tier B — in-flight pair pruning. Oldest
tool_use/tool_result
message pairs in the active cycle are removed as a unit so neither
side is orphaned. The most recent pair is always retained. - Tier C — tool-defs trimming. Tool definitions not referenced by
any in-flighttool_useare dropped. Conservative: never drops a
tool the assistant is actively using. - Tier D — structured 413. All tiers exhausted; the proxy returns
HTTP 413 with areductionpayload showing what was tried.
Every reduction event lands in the per-turn one-liner (reduce=A→B ej=N(bytes) prune=N tools=N) and a dedicated accounting row in the
ledger (mode="reduce"). This persists the specific tier counters and
the irreducible flag, allowing long-term analysis of how often the
soft cap is triggered and which tiers are most effective at recovering
space.
Kernel orientation
When rebuild_* modes are active, the proxy appends a discipline block to the system prompt instructing the model to:
- Treat the projection as authoritative working state, not the full transcript
- Reach for the right primitive (re-run /
recall:<addr>/ handles) when historical artifacts are needed - Trust that
[ok]tool results in the projection are shapes-only and[error]results carry full bodies - End every turn with a
<turn-digest>{...}</turn-digest>fence so intent survives the next projection
The orientation text is byte-stable across turns and cached at the 1h tier — the model pays for it once per cache window and gets a coherent operating discipline for free.
Hooks (Claude Code)
ostk-cache-hooks installs Claude Code lifecycle hooks (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop) that POST to the proxy's /hook/event endpoint. The proxy ledgers each event into .l1.5/hooks.jsonl and snapshots manifest.json on session stop.
ostk-cache-hooks install # idempotent; appends, never overwrites; backs up settings.json
ostk-cache-hooks status
ostk-cache-hooks uninstall # --purge to also remove dispatch script
Other agent surfaces with similar hook conventions (any tool that exposes session-lifecycle hooks and lets you shell out) can post to /hook/event directly — the endpoint is generic HTTP. See docs/HOOKS.md for the wire format and a manual settings.json snippet.
Stats and A/B analysis
ostk-cache-stats reads .ostk/memory/ledger.jsonl and emits per-session JSON or CSV.
ostk-cache-stats --window 24h --format json
ostk-cache-stats --mode rebuild_local # filter by mode
ostk-cache-stats --workspace <16-char-hash> # filter by workspace
Fields per session: amp_mean, amp_p50, cache_hit_rate, turns, state_bytes_mean, mode. For the recommended A/B comparison protocol (collect a window in each mode, partition by mode field, run side-by-side aggregation), see docs/PASSTHROUGH.md.
Legacy environment variables
For pre-existing scripts and Makefile targets, the env-var surface is
kept intact (CLI flags and .ostk/cache.toml override these silently
when set):
| Variable | Default | Purpose |
|---|---|---|
ANTHROPIC_API_KEY |
(required) | Forwarded as x-api-key upstream. |
ANTHROPIC_BASE_URL |
https://api.anthropic.com |
Anthropic upstream override (matches --upstream). |
OPENAI_BASE_URL |
https://api.openai.com |
GPT upstream override when --provider gpt. |
OSTK_PROVIDER |
anthropic |
anthropic or gpt adapter selection. |
PROXY_PORT |
8080 |
TCP port the proxy binds (matches --port). |
OSTK_CACHE_PASSTHROUGH |
unset | 1/true/yes → byte-identical forward. |
OSTK_CACHE_REBUILD |
unset | 1 → standalone rebuild; kernel → federated. |
OSTK_CACHE_TAIL_TRANSCRIPT |
unset | 1 → ingest local Claude Code transcript tail. |
OSTK_CACHE_TAIL_LIMIT |
50 |
Per-request transcript event cap. |
OSTK_CACHE_KERNEL_TIMEOUT_MS |
500 |
Per-IPC timeout when fetching a kernel projection. |
OSTK_CACHE_CLAUDE_PROJECTS_DIR |
~/.claude/projects |
Override transcript-tail source directory. |
OSTK_KERNEL_SOCKET |
unset | Pin explicit kernel socket path (skip cwd-walk). |
OSTK_REWRITE_ENABLED |
1 |
0/false → disable file-handle rewrite pass. |
OSTK_CAPTURE_HTTP |
unset | 1/true/yes → capture request/response bodies. |
OSTK_CAPTURE_HTTP_DIR |
<cwd>/.ostk/http-capture |
Override capture output directory. |
OSTK_DIR |
<cwd>/.ostk if exists |
Workspace .ostk/ for file-handle cache. |
Workspace identity
The proxy partitions cache logic per workspace to prevent cross-repo pollution. Workspace identity is resolved in priority order:
- Explicit: sha256 of
<cwd>/.l1.5/workspace-idif present. - Git origin: sha256 of
git -C <cwd> config --get remote.origin.url(normalized). - Path: sha256 of
realpath(cwd).
The first 16 hex chars become the workspace_id used in hooks.jsonl rows.
Layout
.ostk/memory/
ledger.jsonl append-only AmpRow log (cache hits, token usage, mode tag)
.ostk/http-capture/
<request-id>/ opt-in full-body capture (`--capture-http`)
request-in.body
request-out.body
response.body
metadata.json
.l1.5/
workspace-id optional explicit workspace identifier
hooks.jsonl session lifecycle events (rotated hourly to .gz)
manifest.json snapshot written on Stop hook
Architecture
Hyper + Axum HTTP listener. tokio::net::TcpListener for incoming connections, reqwest for upstream forwarding. Streaming responses are mapped block-by-block via async-stream so SSE flush boundaries survive. The page-table substrate is the Page / PageState types from the ostk-page membrane crate; the in-memory backend is the default but the PageTable trait is open for alternate implementations.
The kernel_client module speaks the haystack daemon's IPC protocol over .ostk/ostk.sock (Unix domain socket). On Windows, federation is unavailable (the kernel projection path is cfg(unix)-stubbed); the proxy runs in standalone modes only.
Documentation
- docs/HOOKS.md — Claude Code lifecycle hook integration, manual settings.json snippet, troubleshooting.
- docs/PASSTHROUGH.md — A/B comparison protocol for evaluating mutation impact.
License
Dual-licensed under either:
at your option. Contributions are accepted under the same terms (Apache-2.0 §5).
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found