ctxharness

skill
Security Audit
Fail
Health Warn
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 9 GitHub stars
Code Fail
  • rm -rf — Recursive force deletion command in .claude/hooks/dangerous-actions-blocker.sh
  • eval() — Dynamic code execution via eval() in .claude/hooks/security-gate.sh
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

Catch stale versions, broken paths, and missing scripts in your AI instruction files (CLAUDE.md, rules)

README.md

ctxharness

AI documentation drift detection for teams using Claude Code, Cursor, Copilot, and any agent-driven workflow.

Your CLAUDE.md says the auth config lives at src/config/auth.ts. That file moved to src/modules/auth/config.ts six months ago. Your agent tries to import from a path that no longer exists, silently, on every session.

Or: CLAUDE.md says npm run typecheck. The script was renamed to npm run type-check during a cleanup sprint. The agent runs a command that doesn't exist.

ctxharness catches this before it reaches your agents.

npx ctxharness scan CLAUDE.md   # zero-config — detect drift instantly
# or, with full config:
npx ctxharness init              # scaffold .ctxharness.yml
npx ctxharness run               # check all assertions

Why put facts in CLAUDE.md at all?

Agents work from the context window. Pointing an agent to package.json works, but it means reading that file on every session, and many facts can't be read from a single source file: architectural patterns, team conventions, file locations, which ORM you're using and why. ctxharness is for facts you've already decided to state explicitly. It keeps those statements accurate.

What it checks

L1 — Fact drift: file existence, npm scripts, versions, counts, regex captures — any extractable fact from your codebase vs what your AI docs claim.

L2 — Instruction quality: vague language that degrades agent reliability ("be careful", "use your judgment"), positive/negative instruction ratio, multi-file coherence, token budget.

L3 — Context assembly: hook validation, skill loading, rule glob validity, coverage ratio.

No migration needed. Works on your existing CLAUDE.md, AGENTS.md, and .cursorrules files as-is.

Install

npm/pnpm (Node.js projects):

npm install -g ctxharness
# or
pnpm add -D ctxharness

Standalone binary for Python, Go, Rust and other non-Node projects: planned for v1.0.

Quick start

ctxharness init          # creates .ctxharness.yml
ctxharness doctor        # full health check with L1/L2/L3 breakdown

Example output:

AI Context Test — 5 assertions

fact                    expected       mentions  status
────────────────────────────────────────────────────────────────────────
auth-config-path        true                  1  ✗ 1 mismatch
typecheck-script        true                  1  ✗ 1 mismatch
node-version            22.14.0               1  ✓ 1/1 pass
no-vague-language       check                 2  ✓ 2/2 pass
instruction-balance     check                 2  ✓ 2/2 pass
────────────────────────────────────────────────────────────────────────

Mismatches
────────────────────────────────────────────────────────────────────────
auth-config-path        CLAUDE.md:18   true        false
typecheck-script        CLAUDE.md:34   true        false
────────────────────────────────────────────────────────────────────────
✗ 2 mismatch(es) — update the file(s) listed above

Configuration

.ctxharness.yml — minimal starter (one assertion per layer):

version: 1

files:
  include:
    - 'CLAUDE.md'
    - 'AGENTS.md'
    - '.cursorrules'
  exclude:
    - 'node_modules/**'

assertions:
  # L1 — fact drift: auth config path still exists
  - id: auth-config-path
    extractor: fileExists
    extractorArgs:
      path: src/modules/auth/config.ts
    scanner: literalInMd

  # L1 — fact drift: typecheck script matches package.json
  - id: typecheck-script
    extractor: packageScript
    extractorArgs:
      script: type-check
    scanner: literalInMd

  # L2 — instruction quality: no vague language
  - id: no-vague-language
    extractor: constant
    extractorArgs:
      value: check
    scanner: vaguenessPattern

  # L3 — context assembly: hooks are valid
  - id: hook-validity
    extractor: constant
    extractorArgs:
      value: check
    scanner: hookValidity
Advanced config — allowlist, scopeFiles, multi-version assertions
assertions:
  # allowlist: skip known-intentional mismatches in specific files
  - id: next-version
    extractor: packageJson
    extractorArgs:
      package: next
    scanner: inlineRegex
    scannerArgs:
      pattern: 'Next\.js\s+v?(\d+(?:\.\d+(?:\.\d+)?)?)'
    allowlist:
      - CHANGELOG.md   # version history file — intentional old values

  # scopeFiles: restrict an assertion to a subset of files
  - id: instruction-balance
    extractor: constant
    extractorArgs:
      value: check
    scanner: negativeConstraintDensity
    scannerArgs:
      minRatio: 2.0
    scopeFiles:
      include:
        - 'CLAUDE.md'
        - 'AGENTS.md'
      exclude:
        - '.cursorrules'   # constraint-only file by design

Extractors

Read ground truth from your codebase. Common ones: fileExists, packageScript, packageJson, nvmrc, gitStaleness, prismaModelList, goMod, cargoToml.

Full extractor list (20)
Name What it reads Args
packageJson dependencies/devDependencies version package: string
packageManager packageManager field (strips corepack hash)
nvmrc .nvmrc file
fileExists Whether a path exists ("true"/"false") path: string
regexScan Capture group from any file path, pattern, group?
countMatches Count of pattern matches in a file path, pattern
constant Fixed value (placeholder for quality scanners) value: string
prismaModel Count of model X {} blocks in a Prisma schema path: string
prismaModelList JSON array of model names from a Prisma schema path: string
prismaEnum Count of values in a named Prisma enum path: string, enum: string
trpcRouter Count of router entries in a tRPC root file path: string
trpcRouterList JSON array of router names from a tRPC root file path: string
gitStaleness Commits since a file was last changed (0 = up-to-date) path: string
packageEngines Node/runtime version from package.json engines field (strips >= operators) field?: string (default "node")
tsconfigPaths Count of path aliases in tsconfig.json compilerOptions.paths (JSONC-aware) path?: string (default "tsconfig.json")
pyprojectToml Version from pyproject.toml — Poetry and PEP 621 formats package?: string, field?: string
requirementsTxt Package version from requirements.txt package: string, path?: string
cargoToml Version from Cargo.toml — own version or dependency; supports Cargo workspaces ([workspace.package], [workspace.dependencies]) package?: string, field?: string
goMod Module version from go.mod module: string
packageScript Returns "true"/"false" if a named npm script exists in package.json script (required), file (optional, default "package.json")

Version normalization: v22 matches 22.14.0 — partial mentions are valid.

Scanners

Find and validate content in your AI doc files. Common ones: inlineRegex, literalInMd, vaguenessPattern, hookValidity, coverageRatio, freshnessScore.

Full scanner list (15)

Drift scanners (compare against extractor value)

Name What it scans Args
inlineRegex All lines matching a regex pattern, flags?
codeBlockRegex Lines inside fenced code blocks only pattern, lang?, flags?
yamlField YAML front matter or inline YAML field (dot-path)
jsonField Inline JSON blocks field (dot-path)
literalInMd Literal string presence literal
pathReference File path reference path

Quality scanners (no extractor value needed, use constant)

Name What it detects Args
vaguenessPattern Vague instructions ("be careful", "as needed", "use your judgment"…) patterns?: string[]
negativeConstraintDensity Positive/negative instruction ratio below threshold minRatio?: number (default 1.0)
contextBudget File token footprint — fails if estimated tokens exceed threshold maxTokens?: number (default 3000), followImports?: boolean (follows @file.md chains up to depth 3)
ruleGlobValidity Claude Code rules file — checks for YAML frontmatter and optional paths: field requirePaths?: boolean (default false)
hookValidity Standalone. Resolves .claude/settings.json from project root and validates each hook entry
backtickEntityPresence Checks that `entity` appears as inline code in the doc entity: string
skillValidity Standalone. Globs .claude/skills/**/*.md from project root — validates YAML frontmatter has name: and description: requireDescription?: boolean (default true)
freshnessScore Standalone. Interprets commit count from gitStaleness — returns pass/warn/fail based on thresholds warnAfter?: number (default 30), failAfter?: number (default 100)
coverageRatio Checks what fraction of a JSON array (from prismaModelList/trpcRouterList) appears in the doc minRatio?: number (default 0.8), valueAllowlist?: string[]

Standalone scanners (hookValidity, skillValidity, freshnessScore) bypass files.include and resolve their own targets from the project root. You do not add their paths to files.include — they run once regardless of how many files are in scope.

vaguenessPattern accepts custom patterns via scannerArgs.patterns (array of regex strings).

contextBudget estimates tokens as chars ÷ 4. Designed to run over .claude/rules/**/*.md or CLAUDE.md to catch bloated always-on context files. With followImports: true, it resolves @file.md references recursively up to depth 3 and includes their token footprint in the total.

ruleGlobValidity is designed to run over .claude/rules/**/*.md. By default it fails if a rules file has no YAML frontmatter (meaning it loads at every session with no scoping). Set requirePaths: true to also fail if the frontmatter lacks a paths: field.

freshnessScore works with gitStaleness extractor. gitStaleness returns the commit count since the file was last changed; freshnessScore compares it to your thresholds:

- id: claude-md-freshness
  extractor: gitStaleness
  extractorArgs:
    path: CLAUDE.md
  scanner: freshnessScore
  scannerArgs:
    warnAfter: 20   # ⚠ warn if >20 commits since last edit
    failAfter: 50   # ✗ fail if >50 commits

coverageRatio checks that a fraction of your actual entities (models, routers…) are mentioned in the docs — useful when you can't document everything but want to enforce a minimum:

- id: prisma-model-coverage
  extractor: prismaModelList
  extractorArgs:
    path: src/server/db/prisma/schema.prisma
  scanner: coverageRatio
  scannerArgs:
    minRatio: 0.5          # at least 50% of models mentioned
  valueAllowlist:
    - MigrationVersion     # internal model, not required in CLAUDE.md

CLI

ctxharness run       # run all assertions, exit 1 on drift
ctxharness check     # alias for run --format text
ctxharness scan      # scan a markdown file for verifiable claims without a config file
ctxharness score     # run assertions and report a 0-100 health score with grade (S/A/B/C/D/F)
ctxharness trend     # show cross-run drift score history — sparkline, direction, per-run table
ctxharness populate  # scan declared files and suggest new assertions for uncovered claims
ctxharness snapshot  # save a quality snapshot to .ctxharness/snapshots/
ctxharness diff      # compare against latest snapshot — exit 1 on score regression
ctxharness fix       # auto-fix version drift — dry-run by default, --apply writes files
ctxharness doctor    # comprehensive health check with L1/L2/L3 breakdown and remediation advice
ctxharness init      # scaffold .ctxharness.yml

ctxharness init --hooks also installs Husky post-merge / post-checkout hook scripts alongside the config.

Options:

-c, --config <path>    Config file path (default: .ctxharness.yml)
-f, --format <fmt>     Output format: text | json | gha (default: text)
-r, --root <dir>       Project root (default: cwd)
-w, --watch            Re-run on file changes (run command only)

ctxharness fix finds every assertion where the actual version differs from expected on a specific line and shows what it would change. Pass --apply to write the files:

$ ctxharness fix
CLAUDE.md:13  prisma-version  7.5 → 7.7.0
CLAUDE.md:42  next-version    15.2.0 → 15.3.1

Run ctxharness fix --apply to write changes.

Zero-config scan

Before setting up a full .ctxharness.yml, you can scan any AI instruction file for verifiable claims:

npx ctxharness scan CLAUDE.md

This detects file paths, npm scripts, and version numbers mentioned in the file and checks each against your codebase. Paths and scripts first, because those are the claims most likely to silently break agent behavior when they drift.

Scanning CLAUDE.md...

  src/config/auth     fileExists   ✗ path not found (moved to src/modules/auth/config.ts?)
  npm run typecheck   packageScript  ✗ script not found in package.json
  Node.js 22.14.0    version      ✓ matches .nvmrc

2 issues found. Run with --suggest-config to generate .ctxharness.yml.

Since v0.4.2, scan follows @file.md includes (Claude/Gemini/Cursor convention) up to depth 3. Claims in included files are detected and verified — a drift in @agents.md referenced from CLAUDE.md is no longer invisible.

npx ctxharness scan CLAUDE.md --suggest-config   # generate a starter .ctxharness.yml
npx ctxharness scan CLAUDE.md --exit-zero        # warn without blocking (hooks / CI)

The detector filters out common false positives: Claude Code slash commands (/plan, /ship), URL route patterns (/api/chunk, /about/), and template placeholders ({slug}, [owner]).

scan vs run: scan is for discovery — zero-config, always informational. run is for enforcement — requires .ctxharness.yml, exits 1 on drift.

Husky hook — the post-merge template automatically picks the right mode:

# .husky/post-merge (generated by ctxharness init --hooks)
if [ -f ".ctxharness.yml" ]; then
  ctxharness check          # blocking — full enforcement
else
  ctxharness scan --exit-zero  # informational — zero-config discovery
fi

Snapshot workflow

Track quality over time and block regressions in CI:

# Save baseline after initial setup
ctxharness snapshot

# In CI: compare against the committed baseline
ctxharness diff     # exit 1 if score dropped

Snapshots are saved to .ctxharness/snapshots/ with timestamp. Commit the latest snapshot file to use diff in CI.

Trend history

Every run, check, score, and doctor execution auto-records a drift score to ~/.ctxharness/history.jsonl. The trend command shows your score trajectory over time:

ctxharness trend

Trend — myproject (8 runs)

  Sparkline   ▃▅▆▇▇▇██
  Direction   ↑ improving  (+14 pts over 8 runs)
  Avg Score   91/100

  Date                    Score      G      Pass   Fail   Time
  ──────────────────────────────────────────────────────────────
  May 06, 14:23:01       100/100    S      5      0      42ms
  May 06, 11:45:22        98/100    S      5      0      38ms
  May 05, 09:12:08        87/100    B      4      1      55ms
  ...

Direction is computed by comparing the average of the first third of runs against the last third: improving (delta > 3 pts), worsening (delta < -3 pts), or flat.

ctxharness trend --all          # all projects
ctxharness trend --limit 50     # last 50 runs (default: 20)
ctxharness trend --project api  # specific project name

In CI, use --no-trend to skip recording — useful when you only want trend data from your main branch, not every PR run:

ctxharness run --no-trend

populate

Scan your already-declared files for verifiable claims (semver, paths, scripts) and suggest assertions for any claims not yet in your config.

# Dry-run (default): preview what would be added
ctxharness populate

# Write changes to .ctxharness.yml
ctxharness populate --apply

Typical workflow: run ctxharness init once to bootstrap the config, then run ctxharness populate --apply any time you update your AI docs and want to capture new claims.

warn status

Assertions can return three states: pass, warn, or fail. warn is counted as 0.5 in the score — useful for staleness checks where you want early signal without blocking CI:

⚠ 1 warn   — claude-md-freshness: 35 commits since last edit

ctxharness doctor categorizes all issues by layer, shows a per-layer score, and suggests next actions:

ctxharness doctor

L1  Doc Drift           ██████████  100/100
L2  Instruction Quality ████████░░   80/100
L3  Context Assembly    ██████░░░░   60/100

Score: 80/100  Grade: B

Issues:
  L2  no-vague-language   AGENTS.md:14   vague pattern found: "be careful"
  L3  hook-validity        .claude/settings.json   hook entry has empty matcher

Plugin API

Register custom extractors and scanners programmatically:

import { definePlugin, loadPlugin } from '@florianbruniaux/ctxharness-core'

const myPlugin = definePlugin({
  extractors: [
    {
      name: 'myExtractor',
      fn: (root, args) => {
        // read ground truth from codebase, return a string
        return '1.2.3'
      },
    },
  ],
  scanners: [
    {
      name: 'myScanner',
      fn: (filePath, expectedValue, args) => {
        // check the file, return ScanResult[]
        return [{ status: 'pass', line: 0, actual: expectedValue }]
      },
    },
  ],
})

loadPlugin(myPlugin)

Use name: myExtractor and scanner: myScanner in your .ctxharness.yml like any built-in.

Stack presets

Ready-to-use config templates for common stacks:

Preset Path Covers
T3 (Next.js + Prisma + tRPC) templates/presets/t3.yml node, next, typescript, prisma, trpc versions + model/router counts
Next.js App Router templates/presets/next-app-router.yml node, next, typescript, react versions + quality assertions
Python templates/presets/python.yml python version, pyproject.toml deps + quality assertions
Go templates/presets/go.yml go toolchain version, go.mod deps + quality assertions
Rust templates/presets/rust.yml crate version, Cargo.toml deps + quality assertions (workspace-aware)

Copy a preset as your .ctxharness.yml starting point:

cp node_modules/ctxharness/templates/presets/t3.yml .ctxharness.yml

CI integration

GitHub Actions:

- name: Check AI doc drift
  uses: FlorianBruniaux/[email protected]
  with:
    config: .ctxharness.yml
    format: gha

Or copy templates/ci/github-actions.yml for a full workflow. GitLab CI and CircleCI templates are at templates/ci/gitlab-ci.yml and templates/ci/circleci.yml.

Husky (post-merge, post-checkout): copy from templates/husky/.

.claude/settings.json vs settings.local.json

Claude Code follows a two-file convention for project settings:

  • settings.json — committed to the repo. Contains config that should work for anyone who clones the project: hook definitions, permission rules, shared assertions. Use relative paths for hook commands (.claude/hooks/my-hook.sh, not /Users/yourname/...).
  • settings.local.json — gitignored. Contains machine-specific or personal overrides: your own keybindings, local path overrides, personal MCP servers.

hookValidity validates settings.json. If a hook command contains an absolute path (e.g. /Users/yourname/...), it returns status: warn — those paths break for every other contributor.

The same layering applies to CLAUDE.md files: project-level goes in the repo root (committed), personal goes in ~/.claude/CLAUDE.md (your machine only). Claude Code memory docs.

Ecosystem Positioning

                         FACTUAL ACCURACY
                                ▲
                                │
                                │              ★ ctxharness
                                │         "Are the claims still true?"
                                │          paths · scripts · versions
                                │
  ──────────────────────────────┼──────────────────────────────────► RUNTIME
  [vigiles]                     │                                    VERIFICATION
  TS spec → CLAUDE.md           │
                                │
  [AgentLint]                   │     [cclint / cursor-doctor]
  structural linter             │     syntax & format checks
                                │
  [Ruler / rulesync]            │
  rule distribution             │
                         STATIC / STRUCTURAL

ctxharness does not compete with any of these tools. It validates what they write, compile, or distribute.

Taxonomy

ctxharness covers three layers of context engineering testing:

Layer What
L1 Doc Drift Facts in AI docs vs code reality — file existence, npm scripts, versions, counts, regex captures
L2 Instruction Quality Vague language, positive/negative ratio, token budget, multi-file coherence
L3 Context Assembly Hook validation, skill loading, rule glob validity, coverage ratio

L4 (agent behavior eval) is out of scope — use Promptfoo or Braintrust for that.

ctxharness vs Promptfoo: Promptfoo evals what your agent says (output quality). ctxharness evals what your agent reads (input freshness). They're complementary, not competing.

Further Reading

The problem ctxharness addresses is well-documented. These are the sources worth reading.

Context engineering — why accuracy matters

  • Context Engineering — Simon Willison, June 2025. Why "context engineering" is a better term than "prompt engineering" and what it means in practice.
  • The Rise of Context Engineering — LangChain, June 2025. "Most of the time when an agent is not performing reliably, the underlying cause is that the appropriate context has not been communicated to the model."
  • Context Engineering for Large Codebases — Packmind, April 2026. Documents "context drift" — stale instruction files referencing deprecated frameworks cause agents to silently generate code using wrong patterns.

What stale context does to LLMs

  • Contextual Drag: How Errors in the Context Affect LLM Reasoning — arXiv, Feb 2026. Wrong context causes 10-20% performance drops across 11 models and 8 reasoning tasks. Self-refinement makes it worse, not better.
  • Knowledge Conflicts for LLMs: A Survey — EMNLP 2024. Temporal knowledge conflicts (outdated context vs. model knowledge) are a primary source of factually wrong outputs. LLMs may generate code using deprecated function signatures from older library versions.
  • Lost in the Middle — Stanford / ACL 2024. Relevant information placed in the middle of long contexts is systematically under-weighted by LLMs. Instruction files that accumulate stale content push critical facts into this dead zone.
  • Your Agent's Context Is a Junk Drawer — Augment Code, Feb 2026. Documents "context collapse" — agents forget earlier constraints when context grows stale and unmanaged.

The CLAUDE.md / AGENTS.md problem specifically

  • Writing a Good CLAUDE.md — HumanLayer, Nov 2025. "Don't include code snippets — they will become out-of-date quickly." Direct practitioner warning on content drift.
  • New Research Reassesses the Value of AGENTS.md Files — InfoQ, March 2026. ETH Zurich study: LLM-generated context files reduce task success by 3% on average and increase inference costs by 20%+. Authors recommend limiting instructions to non-inferable details — exactly the facts ctxharness verifies.
  • When AGENTS.md Backfires — Feb 2026. Only 14.5% of agent context files include security instructions. LLM-generated files reduced task success in 5 of 8 evaluation settings.

License

MIT

Reviews (0)

No results found