Context OS

Context OS is a curated set of token optimizations for Claude Code, packaged as an idempotent, reversible installer. It writes CLAUDE.md, .claudeignore, settings.json, slash commands, an output style, a statusLine, a Haiku subagent, and three zero-dependency Python hooks (dedup guard, loop guard, session profiler) into your project. Not a wrapper, not a proxy, no runtime dependency besides python3 for the hooks (with an optional Rust binary for two additional hook-based techniques).

curl -fsSL https://raw.githubusercontent.com/sravan27/context-os/main/setup.sh | bash

What it installs

Nineteen techniques, grouped by delivery mechanism. Evidence column is honest about where each number comes from.

#	Technique	Mechanism	Evidence
1	Response shaping	`CLAUDE.md` directives (drop preamble, recap, tool announcements)	Third-party benchmark (caveman); ablation pending
2	Output style `terse`	`.claude/output-styles/terse.md` invoked via `/output-style terse`	Documented behavior
3	Noise filtering	`.claudeignore` with 60+ patterns (`node_modules`, `dist`, `.next`, `target`)	Measured per-repo via `--measure`; end-to-end in METHODOLOGY.md
4	Secret exclusion	`.claudeignore` blocks `.env`, `*.pem`, `credentials.json`, SSH/AWS	Documented behavior
5	Repo map + stack hints	`CLAUDE.md` block generated from stack detection	Ablation pending
6	Thinking budget cap	`MAX_THINKING_TOKENS=8000` in `settings.json`	Documented env var
7	Early compaction	`CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=80` (default 95)	Documented env var
8	Prompt caching 1h TTL	`ENABLE_PROMPT_CACHING_1H=1`	Documented env var
9	Non-essential traffic off	`CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1`	Documented env var
10	Context cap	`CLAUDE_CODE_MAX_CONTEXT_TOKENS=150000`	Documented env var
11	Permission auto-grant	`settings.json` allowlist for Read/Glob/Grep/git/test runners	Documented behavior
12	statusLine	`.claude/statusline.sh` (model · branch · context-os marker)	Documented behavior
13	Slash commands	`/compact`, `/context`, `/ship`, `/cheap` in `.claude/commands/`	Documented behavior
14	Haiku subagent	`.claude/agents/explorer.md` delegates exploration to Haiku	Model pricing ratio (Sonnet:Haiku); ablation pending
15	Dedup guard	PreToolUse hook: blocks duplicate `Read`/`Glob`/`Grep` within 10min	Smoke-tested in CI; session-profile reports list how many duplicates were caught
16	Loop guard	PreToolUse hook: warns at 5 edits, blocks at 8 edits on same file per session	Smoke-tested in CI; addresses a pattern called out in Claude Code best-practices
17	Session profiler	Stop hook: writes per-session token breakdown to `.context-os/session-reports/` — surfaces duplicate tool calls, edit loops, oversized results	Deterministic transcript parser; no telemetry phones home
18	Output compression (Rust)	PostToolUse hook wraps test/build output through typed reducers	Measured on 50-test cargo fixture (see METHODOLOGY.md §4)
19	Session memory (Rust)	PreCompact + Stop hooks write restart packet	Measured on fail-edit-pass cycle (see METHODOLOGY.md §5)

Techniques 1–17 install via setup.sh (shell + Python stdlib only). Techniques 18–19 require the optional Rust binary.

What it doesn't do

No LLM routing, model swapping, or prompt rewriting.
No proxy in front of Claude Code. Claude Code talks to Anthropic directly.
No telemetry. No phone-home. No analytics. Read setup.sh.
No attempt to outguess Anthropic's defaults where defaults are reasonable.

Install

Per-project (recommended):

curl -fsSL https://raw.githubusercontent.com/sravan27/context-os/main/setup.sh | bash

Global (response shaping + env vars to ~/.claude/, applies to every project):

curl -fsSL https://raw.githubusercontent.com/sravan27/context-os/main/setup.sh | bash -s -- --global

With the Rust binary (adds techniques 18–19, output compression + session memory):

cargo install --git https://github.com/sravan27/context-os --path apps/cli
context-os init

Stack auto-detection covers: Node/TypeScript, Next.js, Python, Rust, Go, Flutter/Dart. Stack-specific hints are appended to CLAUDE.md; generic hints otherwise.

Uninstall

curl -fsSL https://raw.githubusercontent.com/sravan27/context-os/main/setup.sh | bash -s -- --uninstall

Removes only the  block from CLAUDE.md and files Context OS wrote. Pre-existing content is preserved. Idempotent.

Measure

Dry-run estimator (no writes, no install):

curl -fsSL https://raw.githubusercontent.com/sravan27/context-os/main/setup.sh | bash -s -- --measure

Scans the repo, counts source vs. noise files, and estimates per-session token savings from static config (noise filtering, thinking cap, response shaping, output compression). Output is an estimate from file counts, not a measurement of a live session.

Status check:

curl -fsSL https://raw.githubusercontent.com/sravan27/context-os/main/setup.sh | bash -s -- --status

Benchmarks

Methodology: docs/METHODOLOGY.md. Raw reports: python/evals/reports/.

End-to-end measurement on a 2-file fixture (/tmp/cos-bench-test: one README and one .js file) via scripts/benchmark.sh, running the identical prompt through claude --print before and after install:

Metric	Before	After	Delta
Input tokens	5	4	−1
Cached reads	74,064	48,182	−25,882
Output tokens	466	294	−172
Total tokens	79,790	54,036	−32.3%
Cost (USD, Sonnet 4.6)	$0.049	$0.040	−18.8%

This is a trivial fixture. It is the floor, not the claim. A 2-file repo has almost nothing to filter; the 32% reduction comes mostly from response shaping and the thinking cap. On real repos with node_modules, dist, lockfiles, and longer sessions, the noise filtering and prompt caching contributions grow substantially. Reproduce against any repo:

git clone https://github.com/sravan27/context-os && cd context-os
scripts/benchmark.sh /path/to/your/repo --model sonnet

Requires claude on PATH. Results written to /tmp/cos-last-benchmark.json.

Component-level measurements (see METHODOLOGY.md for each):

Output compression: 71% token reduction on 50-test cargo fixture; 48 passing tests collapsed to one line, 2 failures preserved verbatim.
Protected-string recall: 100% on the reducer test corpus (paths, errors, versions).
Concurrent writes: 5/5 PostToolUse writes captured under lockfile.
Compaction survival: decisions and rationale survive a fail-edit-pass cycle across compaction boundary.
Command pattern coverage: 42 test/build invocations matched including cd /x && RUST_BACKTRACE=1 cargo test.

Architecture

setup.sh is a single shell script that writes 14 config-only techniques plus 3 Python hooks. It detects stack, generates CLAUDE.md with a  block, writes .claudeignore, merges .claude/settings.json, drops slash commands / output style / statusLine / explorer subagent into .claude/, and installs .claude/hooks/{dedup_guard,loop_guard,session_profile}.py with a merged entry in .claude/settings.local.json.

The Python hooks are zero-dependency (stdlib only), fail-open on any error (never break a user session), and store per-session state under ~/.context-os/state/. Each hook is auditable — cat it and read 100 lines.

The optional Rust binary (apps/cli) installs two additional hooks wired in hooks.json:

PostToolUse (hooks.json:12) for test/build output compression via reducer-engine.
PreCompact (hooks.json:38) and Stop (hooks.json:51) for session memory handoff.

Rust crates: reducer-engine (typed output compression), session-memory (handoff writer), token-estimator, config, telemetry (local-only; writes to .context-os/, never leaves machine).

Manual techniques not automated but documented in CLAUDE.md: /clear between tasks, /btw for side questions, /compact [instructions], plan mode (Shift+Tab), specific prompting, @filename references, writer/reviewer split, explicit explorer-subagent delegation.

Contributing

To add a technique, open a PR with:

The config change (or hook code).
A test in python/evals/ or tests/ that exercises it.
An entry in the Evidence column pointing to a measurement or documented behavior. "Ablation pending" is acceptable for new entries; "this seems like it should help" is not.

Tests:

cargo test
python3 python/evals/runners/safe_mode_runner.py
python3 python/evals/runners/compaction_survival_runner.py
scripts/benchmark.sh /tmp/cos-bench-test --model sonnet

Limitations

Does not bypass usage limits.
Response shaping effectiveness varies by task: 40–65% on explanation-heavy, 13–21% on structured code generation (third-party measurement; our ablation pending).
Hook-based techniques depend on Claude Code exposing PreToolUse, PostToolUse, PreCompact, SessionStart, Stop.
The  CLAUDE.md block costs ~12–15% input overhead per turn. Amortized across a session, it pays back in 1–2 turns on non-trivial repos.

Acknowledgments

JuliusBrussee/caveman — response shaping benchmark.
Anthropic Claude Code documentation — env vars, hooks, output styles, subagents.

License

MIT. See LICENSE.

For the Claude Code team

If you work on Claude Code at Anthropic: see docs/FOR-CLAUDE-CODE-TEAM.md for three findings and recommendations.