Writ

Name: Writ-Public
Author: infinri

A knowledge retrieval engine that gives AI coding agents access to your organization's coding standards, architectural decisions, and enforcement rules -- delivering only the relevant ones, in milliseconds.

The Problem

At 50 rules, you can paste your coding standards into the prompt. At 500, you can't. At 5,000, you need a system that knows which rules matter for the task at hand.

Context-stuffing does not scale:

Context-stuffing: 15,812 tokens (46 rules)
Writ retrieval:   1,397 tokens (5 rules, 0.2ms)
Ratio:            11x reduction

At 1,000 rules the reduction is 76x. At 10,000 it would require 1.17 million tokens -- Writ returns ~1,600 for a 727x reduction.

How It Works

A local service sits between your AI agent and your knowledge base. The agent sends a query. Writ searches a graph database, scores candidates across five retrieval stages, and returns ranked results within a context budget. No cloud calls. No API keys. Fully offline.

Five-stage hybrid retrieval

Domain filter -- narrows to the relevant subgraph
BM25 keyword search -- Tantivy sparse retrieval on triggers, statements, and tags
ANN vector search -- hnswlib semantic search with ONNX Runtime inference
Graph traversal -- pre-computed adjacency cache following DEPENDS_ON, CONFLICTS_WITH, and SUPPLEMENTS edges
Two-pass RRF ranking -- first pass scores keyword/vector/severity/confidence; second pass adds graph-neighbor proximity from top results, applies authority preference and context budget

All indexes are pre-warmed at startup. No I/O in the query path.

What the agent sees

--- WRIT RULES (3 rules, standard mode) ---

[PY-ASYNC-001] (?, ?, ?) score=0.892
WHEN: Calling a sync I/O function inside an async def function.
RULE: Async call chains must use async I/O end-to-end. A sync call
      blocks the event loop, defeating the purpose of async.
VIOLATION: Using requests.get() in an async handler
CORRECT: Using httpx.AsyncClient within async context

--- END WRIT RULES ---

Rules are injected before the agent writes code. Not because it was asked to check -- because the rules are already in context when it starts working.

Claude Code Integration

Writ ships as a Claude Code plugin. Open Claude Code and everything starts automatically -- Neo4j via Docker, Writ server in the background, hooks registered. When Claude Code exits, the server shuts down. No startup scripts.

If something fails to start, hooks fall back gracefully. Claude sees a one-line warning and keeps working.

Mode-based workflow enforcement

Every session operates in one of four modes:

Mode	Purpose	Gates	Code generation
Conversation	Discussion, brainstorming	None	No
Debug	Investigating a problem	None	No
Review	Evaluating code against rules	None	No
Work	Building/modifying code	plan + test-skeletons	Yes

Work mode enforces a plan-then-test-then-implement sequence with two approval gates. The agent cannot write implementation code until the plan is approved and test skeletons are written. Gate enforcement is deterministic -- hooks block writes that violate the sequence, not just advise against them.

Three-layer instruction architecture

Writ uses Claude Code's instruction hierarchy strategically:

Global rules file (~/.claude/rules/) -- behavioral instructions loaded at session start in every project. Covers failure modes: stop after gate denial, phase boundary rules, plan timing requirements. ~25 lines, always loaded.
Hook injection -- dynamic, state-aware context. Current phase indicator, RAG query results, mode reminders. Changes every turn based on session state.
Hook enforcement -- deterministic gate checks on every Write/Edit. First denial blocks with guidance. Second denial escalates to a user permission dialog -- the agent physically cannot proceed until the human responds.

The separation is deliberate: rules prevent the wrong decision, hooks catch violations, escalation forces human intervention when self-correction fails.

16 hooks across 4 event types

Event	Hooks	Purpose
UserPromptSubmit (2)	RAG injection, approval detection	Query Writ for rules, detect user approvals
PreToolUse (6)	Gate enforcement, plan validation, RAG	Block unauthorized writes, validate plan format, inject file-context rules on Write/Edit and Read
PostToolUse (5)	Validation, compliance, RAG	Static analysis, rule compliance, post-write rule injection
Stop (3)	Metrics, feedback, logging	Friction logging, auto-feedback, token tracking

All hooks parse Claude Code's structured tool dispatch envelope natively. Gate enforcement uses JSON permissionDecision with deny-to-ask escalation -- not exit codes.

Feedback loop

Rules are injected before the agent writes code. Static analysis runs on what it produces. Outcomes are correlated with which rules were in context. Feedback flows back automatically.

Rules that consistently produce good outcomes earn higher confidence
Rules that don't get flagged for human review
When no rules exist for a situation, the agent is prompted to propose one
Proposals pass a five-check structural gate before entering the graph as provisional
Graduation requires n>=50 observations with a positive ratio >= 0.75

The knowledge base grows from experience, not just documentation.

Key Capabilities

Sub-millisecond retrieval -- p95 under 0.2ms at 80 rules, under 0.6ms at 10,000
Hybrid search -- keyword, semantic, and graph traversal combined. Finds rules that keyword search alone would miss
Relationship-aware -- knows dependencies, conflicts, and supplements between rules. Returns full context, not isolated fragments
Authority-aware ranking -- human-authored rules mechanically outrank AI-proposed rules at equal relevance. The guarantee is structural, not weight-based
Frequency-driven confidence -- rules graduate or get flagged based on statistical evidence from observed outcomes
Rule compression -- clusters rules into abstraction nodes. At 10K rules this yields a 727x context reduction
Session-aware -- multi-query sessions track loaded rules client-side, preventing duplicates and managing token budget. Server stays stateless
Cross-project enforcement -- hooks fire in every project where they're wired. Behavioral instructions load globally via rules files. No per-project setup
Language agnostic -- gate categories support PHP, Python, JavaScript, TypeScript, Go, Rust, Java, Ruby, GraphQL, XML. Framework-specific patterns for Magento 2, Django, Rails, Spring, NestJS, Express, Laravel

Benchmarks

Measured on an 80-rule corpus with 83 ground-truth queries.

Latency (warm indexes)

Stage	Component	p95	Budget	Headroom
2	Keyword search	0.175ms	2.0ms	11x
3	Semantic search	0.047ms	3.0ms	64x
4	Graph traversal	0.001ms	3.0ms	3000x
5	Ranking	0.089ms	1.0ms	11x
--	End-to-end	0.19ms	10.0ms	53x

Retrieval quality

Metric	Value	Threshold
MRR@5 (19 ambiguous queries)	0.7842	> 0.78
Hit rate (83 total queries)	97.59%	> 90%

Scale

Metric	80 rules	500 rules	1K rules	10K rules
E2E p95	0.19ms	0.48ms	0.57ms	0.55ms
Context reduction	4.4x	39.3x	75.8x	727x
Memory (clean serve)	~700 MB	--	--	~3 GB
Cold start	0.58s	3.64s	7.74s	70.2s

Current Status

~320 tests across 29 test files and 12 performance benchmarks, all passing.

Core retrieval -- Five-stage hybrid pipeline, two-pass RRF ranking, rule compression, ONNX inference, session-aware context tracking.

Claude Code integration -- Plugin with lifecycle management, 16 hooks across 4 event types, mode-based workflow enforcement with deny-to-ask gate escalation, automated feedback loop, per-file RAG injection, comprehensive friction logging.

Knowledge evolution -- Authority model with mechanical ranking guarantee, structural quality gate, human review queue, frequency tracking with empirical graduation.

Observability -- Per-write friction logging (gate denials, hook timing, RAG query coverage), rule coverage analysis per implementation file, phase transition metrics.

Future:

Sub-agent architecture for per-file implementation in fresh context windows
Domain generalization beyond coding rules
Semantic gap detection for multi-query sessions
Qdrant migration for persistent embeddings at 10K+ rules

Disclaimer

This is the public face of Writ. The source code, rule corpus, retrieval pipeline, and implementation details live in a private repository.

Author

Lucio Saldivar -- Infinri