Writ-Public
Health Warn
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Warn
- Code scan incomplete — No supported source files were scanned during light audit
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Hybrid RAG knowledge retrieval and enforcement engine for AI coding agents. Five-stage pipeline (BM25 + vector + graph traversal + RRF ranking) delivers relevant rules in <0.2ms. Claude Code plugin with 16 hooks, mode-based workflow gates, and deny-to-ask escalation. Neo4j + Tantivy + hnswlib + ONNX Runtime.
Writ
A knowledge retrieval engine that gives AI coding agents access to your organization's coding standards, architectural decisions, and enforcement rules -- delivering only the relevant ones, in milliseconds.
The Problem
At 50 rules, you can paste your coding standards into the prompt. At 500, you can't. At 5,000, you need a system that knows which rules matter for the task at hand.
Context-stuffing does not scale:
Context-stuffing: 15,812 tokens (46 rules)
Writ retrieval: 1,397 tokens (5 rules, 0.2ms)
Ratio: 11x reduction
At 1,000 rules the reduction is 76x. At 10,000 it would require 1.17 million tokens -- Writ returns ~1,600 for a 727x reduction.
How It Works
A local service sits between your AI agent and your knowledge base. The agent sends a query. Writ searches a graph database, scores candidates across five retrieval stages, and returns ranked results within a context budget. No cloud calls. No API keys. Fully offline.
Five-stage hybrid retrieval
- Domain filter -- narrows to the relevant subgraph
- BM25 keyword search -- Tantivy sparse retrieval on triggers, statements, and tags
- ANN vector search -- hnswlib semantic search with ONNX Runtime inference
- Graph traversal -- pre-computed adjacency cache following DEPENDS_ON, CONFLICTS_WITH, and SUPPLEMENTS edges
- Two-pass RRF ranking -- first pass scores keyword/vector/severity/confidence; second pass adds graph-neighbor proximity from top results, applies authority preference and context budget
All indexes are pre-warmed at startup. No I/O in the query path.
What the agent sees
--- WRIT RULES (3 rules, standard mode) ---
[PY-ASYNC-001] (?, ?, ?) score=0.892
WHEN: Calling a sync I/O function inside an async def function.
RULE: Async call chains must use async I/O end-to-end. A sync call
blocks the event loop, defeating the purpose of async.
VIOLATION: Using requests.get() in an async handler
CORRECT: Using httpx.AsyncClient within async context
--- END WRIT RULES ---
Rules are injected before the agent writes code. Not because it was asked to check -- because the rules are already in context when it starts working.
Claude Code Integration
Writ ships as a Claude Code plugin. Open Claude Code and everything starts automatically -- Neo4j via Docker, Writ server in the background, hooks registered. When Claude Code exits, the server shuts down. No startup scripts.
If something fails to start, hooks fall back gracefully. Claude sees a one-line warning and keeps working.
Mode-based workflow enforcement
Every session operates in one of four modes:
| Mode | Purpose | Gates | Code generation |
|---|---|---|---|
| Conversation | Discussion, brainstorming | None | No |
| Debug | Investigating a problem | None | No |
| Review | Evaluating code against rules | None | No |
| Work | Building/modifying code | plan + test-skeletons | Yes |
Work mode enforces a plan-then-test-then-implement sequence with two approval gates. The agent cannot write implementation code until the plan is approved and test skeletons are written. Gate enforcement is deterministic -- hooks block writes that violate the sequence, not just advise against them.
Three-layer instruction architecture
Writ uses Claude Code's instruction hierarchy strategically:
- Global rules file (
~/.claude/rules/) -- behavioral instructions loaded at session start in every project. Covers failure modes: stop after gate denial, phase boundary rules, plan timing requirements. ~25 lines, always loaded. - Hook injection -- dynamic, state-aware context. Current phase indicator, RAG query results, mode reminders. Changes every turn based on session state.
- Hook enforcement -- deterministic gate checks on every Write/Edit. First denial blocks with guidance. Second denial escalates to a user permission dialog -- the agent physically cannot proceed until the human responds.
The separation is deliberate: rules prevent the wrong decision, hooks catch violations, escalation forces human intervention when self-correction fails.
16 hooks across 4 event types
| Event | Hooks | Purpose |
|---|---|---|
| UserPromptSubmit (2) | RAG injection, approval detection | Query Writ for rules, detect user approvals |
| PreToolUse (6) | Gate enforcement, plan validation, RAG | Block unauthorized writes, validate plan format, inject file-context rules on Write/Edit and Read |
| PostToolUse (5) | Validation, compliance, RAG | Static analysis, rule compliance, post-write rule injection |
| Stop (3) | Metrics, feedback, logging | Friction logging, auto-feedback, token tracking |
All hooks parse Claude Code's structured tool dispatch envelope natively. Gate enforcement uses JSON permissionDecision with deny-to-ask escalation -- not exit codes.
Feedback loop
Rules are injected before the agent writes code. Static analysis runs on what it produces. Outcomes are correlated with which rules were in context. Feedback flows back automatically.
- Rules that consistently produce good outcomes earn higher confidence
- Rules that don't get flagged for human review
- When no rules exist for a situation, the agent is prompted to propose one
- Proposals pass a five-check structural gate before entering the graph as provisional
- Graduation requires n>=50 observations with a positive ratio >= 0.75
The knowledge base grows from experience, not just documentation.
Key Capabilities
- Sub-millisecond retrieval -- p95 under 0.2ms at 80 rules, under 0.6ms at 10,000
- Hybrid search -- keyword, semantic, and graph traversal combined. Finds rules that keyword search alone would miss
- Relationship-aware -- knows dependencies, conflicts, and supplements between rules. Returns full context, not isolated fragments
- Authority-aware ranking -- human-authored rules mechanically outrank AI-proposed rules at equal relevance. The guarantee is structural, not weight-based
- Frequency-driven confidence -- rules graduate or get flagged based on statistical evidence from observed outcomes
- Rule compression -- clusters rules into abstraction nodes. At 10K rules this yields a 727x context reduction
- Session-aware -- multi-query sessions track loaded rules client-side, preventing duplicates and managing token budget. Server stays stateless
- Cross-project enforcement -- hooks fire in every project where they're wired. Behavioral instructions load globally via rules files. No per-project setup
- Language agnostic -- gate categories support PHP, Python, JavaScript, TypeScript, Go, Rust, Java, Ruby, GraphQL, XML. Framework-specific patterns for Magento 2, Django, Rails, Spring, NestJS, Express, Laravel
Benchmarks
Measured on an 80-rule corpus with 83 ground-truth queries.
Latency (warm indexes)
| Stage | Component | p95 | Budget | Headroom |
|---|---|---|---|---|
| 2 | Keyword search | 0.175ms | 2.0ms | 11x |
| 3 | Semantic search | 0.047ms | 3.0ms | 64x |
| 4 | Graph traversal | 0.001ms | 3.0ms | 3000x |
| 5 | Ranking | 0.089ms | 1.0ms | 11x |
| -- | End-to-end | 0.19ms | 10.0ms | 53x |
Retrieval quality
| Metric | Value | Threshold |
|---|---|---|
| MRR@5 (19 ambiguous queries) | 0.7842 | > 0.78 |
| Hit rate (83 total queries) | 97.59% | > 90% |
Scale
| Metric | 80 rules | 500 rules | 1K rules | 10K rules |
|---|---|---|---|---|
| E2E p95 | 0.19ms | 0.48ms | 0.57ms | 0.55ms |
| Context reduction | 4.4x | 39.3x | 75.8x | 727x |
| Memory (clean serve) | ~700 MB | -- | -- | ~3 GB |
| Cold start | 0.58s | 3.64s | 7.74s | 70.2s |
Current Status
~320 tests across 29 test files and 12 performance benchmarks, all passing.
Core retrieval -- Five-stage hybrid pipeline, two-pass RRF ranking, rule compression, ONNX inference, session-aware context tracking.
Claude Code integration -- Plugin with lifecycle management, 16 hooks across 4 event types, mode-based workflow enforcement with deny-to-ask gate escalation, automated feedback loop, per-file RAG injection, comprehensive friction logging.
Knowledge evolution -- Authority model with mechanical ranking guarantee, structural quality gate, human review queue, frequency tracking with empirical graduation.
Observability -- Per-write friction logging (gate denials, hook timing, RAG query coverage), rule coverage analysis per implementation file, phase transition metrics.
Future:
- Sub-agent architecture for per-file implementation in fresh context windows
- Domain generalization beyond coding rules
- Semantic gap detection for multi-query sessions
- Qdrant migration for persistent embeddings at 10K+ rules
Disclaimer
This is the public face of Writ. The source code, rule corpus, retrieval pipeline, and implementation details live in a private repository.
Author
Lucio Saldivar -- Infinri
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found