openclerk
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
The governance layer for LLM-maintained Markdown wikis.
OpenClerk
The governance layer for LLM-maintained Markdown wikis. One binary. One SKILL.md.
Local-first, citation-bearing memory for coding agents that should not reread,
duplicate, or silently rewrite truth.
Watch OpenClerk catch stale memory in 60 seconds
curl -fsSL https://github.com/yazanabuashour/openclerk/releases/latest/download/install.sh | sh
openclerk demo init
openclerk demo ask "what changed and what is stale?"
The demo creates an isolated sample vault, writes a source note and synthesis
page, changes the source, and reports the synthesis as stale with citations and
a compile_synthesis repair request.
What is this?
OpenClerk gives agents a citation-bearing, provenance-tracked knowledge base
over your local markdown vault. Agents read and write through a strict JSON
runner — a stable, citable contract that keeps canonical markdown as the
human-readable authority. Knowledge compounds: useful synthesis becomes durable,
inspectable markdown instead of being rediscovered on every query.
Who is it for?
Technical users building agent workflows over local markdown, source notes, or
research vaults. If you want your agent to know your notes — not just search
them — and you want to audit every write, this is the runtime for that.
Why not just RAG / Obsidian / NotebookLM?
| OpenClerk | RAG pipeline | Obsidian | NotebookLM | |
|---|---|---|---|---|
| Canonical authority | Markdown in vault | Embedding index | Markdown in vault | Google's servers |
| Agent-native writes | Yes, with provenance | No | No | No |
| Citations on retrieval | Yes, always | Varies | No | Yes |
| Local-first | Yes | Depends | Yes | No |
| Composable modules | Yes | DIY | Plugin ecosystem | No |
Many RAG setups make the index the primary agent interface. OpenClerk keeps
markdown as the human-readable authority and treats indexes and projections as
derived recall layers. Obsidian has no agent-write contract. NotebookLM is not
local.
For a more detailed buyer matrix, see docs/comparison.md.
Try it in 5 minutes
Direct install:
curl -fsSL https://github.com/yazanabuashour/openclerk/releases/latest/download/install.sh | sh
Full install options: docs/install.md
Or tell your agent:
Install OpenClerk into $HOME/.local/bin using https://github.com/yazanabuashour/openclerk/releases/latest/download/install.sh or the requested release. Register release-matched skills/openclerk/SKILL.md from installer output. Verify command -v openclerk, openclerk --version, and skill path. Report only after runner and skill verify.
Upgrade prompt:
Upgrade OpenClerk using https://github.com/yazanabuashour/openclerk/releases/latest/download/install.sh or the requested release. Re-register release-matched skills/openclerk/SKILL.md from installer output. Verify command -v openclerk, openclerk --version, and skill path. Report only after runner and skill verify.
Then bind your vault and verify retrieval:
openclerk init --vault-root path/to/vault
printf '%s\n' '{"action":"search","search":{"text":"architecture","limit":5}}' \
| openclerk retrieval
Each result carries a doc_id, chunk_id, and citation path. That's the
contract.
What to test first
These are the eval-worthy surfaces in priority order:
- Lexical search — search for a topic you know is present and verify five cited results.
- Document write → re-search — create a short note, then search for it and confirm the cited result appears.
- Synthesis page lifecycle — create a
synthesis/page from source paths, update it, inspect provenance. - Duplicate candidate detection — ingest a near-duplicate, then confirm
duplicate_candidate_reportsurfaces it rather than silently creating a second document. - Relationship graph context — run
graph_context_reportfor a known markdown page and confirm canonical relationship text, links/backlinks, graph freshness, and provenance refs are returned without creating graph truth. - Relationship graph reports — run
graph_relationship_reportfor the same page and confirm relationship paths, direct-vs-derived evidence, typed candidates, and limited graph audit findings stay cited and read-only. - Relationship maintenance plans — run
graph_relationship_maintenance_planfor the same page and confirm candidate section content, next approved write requests, duplicate handling, rollback/audit path, and failure modes are returned withplanned_no_write. - Retrieval replay loop — explicitly run
retrieval_eval_capturefor a dogfood search, then runretrieval_eval_replayafter changes and inspect Jaccard, top-1, and latency metrics. Capture is local-only and stores result ids/paths, not raw vault content. - Search diagnostics — run
search_diagnostics_reportfor a query and confirm it recommends defaultsearchversus explicitsemantic_search, shows module readiness/cost/latency posture, and reports no default ranking change. - Maintenance report — run
maintenance_reportfor a relevant query/path prefix and confirm layout, projection freshness, relationship context, duplicate risk, module posture, and git lifecycle posture are packaged read-only with no repair. - Stale projection detection — update a source doc, then confirm downstream synthesis shows as stale before repair.
Report correctness, tool call count, and wall time. That's how the maintainers
gate new features.
Artifact And File-Type Support
OpenClerk treats artifact content as candidate evidence until a durable write is
approved. The current support matrix is:
| Input | Supported path | Boundary |
|---|---|---|
| Public PDF, HTML, Markdown, or GitHub README/blob URL | ingest_source_url |
Runner-owned inspect/plan, fetch, provenance, duplicate checks, and approved create/update only |
| Supplied video transcript | ingest_video_url |
Transcript text and provenance must be supplied; no native media acquisition |
| Pasted or explicit content | artifact_candidate_plan |
Read-only path/title/body/tags/fields/duplicate proposal before approval |
| Explicit local text, markdown, or text-bearing PDF | artifact_candidate_plan with local_path |
Reads only the supplied file; no durable write |
| Common images or scan-only PDFs | artifact_candidate_plan with text_extraction: "ocr_review" and verified tesseract-ocr module |
OCR text is review-required candidate evidence |
| Opaque binaries, slide decks, emails, chats, forms, native media without transcript | Unsupported by default | Paste reviewed text or use a supported runner path |
No parser output, OCR result, local file metadata, or fetched source becomes
canonical until the user approves the existing write action.
Modules
OpenClerk follows the building block economy
model deliberately. The mainline runner stays narrow. Optional behavior ships
as separately installed, manifest-verified modules:
Agent Module Instructions
Install prompt:
Install the OpenClerk module <module-provider> using <module-manifest-path>.
Use <module-command> on PATH, register <module-skill-path>, and verify with `openclerk module` list_modules. Do not pass command_args or edit SQLite directly.
Upgrade prompt:
Upgrade the OpenClerk module <module-name> to <module-version-or-latest>.
Refresh registration through `openclerk module`, preserve existing provider config, and verify with list_modules. Do not edit SQLite directly.
Available installable modules:
| Module name | Provider | Adds | Manifest | Skill |
|---|---|---|---|---|
ollama-embeddings |
ollama |
Semantic search (local-first) | modules/ollama-embeddings/module.json |
modules/ollama-embeddings/skill/ollama-embeddings/SKILL.md |
gemini-embeddings |
gemini |
Semantic search (cloud opt-in) | modules/gemini-embeddings/module.json |
modules/gemini-embeddings/skill/gemini-embeddings/SKILL.md |
tesseract-ocr |
tesseract |
OCR review for images and scan-only PDFs | modules/tesseract-ocr/module.json |
modules/tesseract-ocr/skill/tesseract-ocr/SKILL.md |
Core lexical search and citation behavior require no modules. Semantic search
is explicit opt-in — it accelerates recall, it does not become the authority
layer. No hidden provider fallback. No committed embedding cache. Full module
install guidance lives in modules/docs/install.md.
Inspect the current block inventory:
openclerk capabilities
What is explicitly not supported yet
- Autonomous browsing / recursive URL crawling —
ingest_source_urlcan inspect, plan, or ingest supplied public URLs through the runner, but it does not browse the open web autonomously or recursively crawl discovered links. - Automatic video transcript acquisition — not supported.
- Hosted service or cloud sync — fully local. No OpenClerk server, no SaaS.
- Multi-user / team server — single-user, single-machine runtime.
- Broad vector DB memory — no Pinecone, Weaviate, or default durable vector index. Semantic modules are optional and local-only by default.
- Autonomous memory and routing — deferred until docs, synthesis, and truth-sync layers are reliable. See
docs/architecture/memory-routing-reference-decision.md.
Runner
openclerk config # persisted product/profile config
openclerk document # doc writes, registry, paths
openclerk retrieval # search, graph context/reports, records, provenance
openclerk clerk # optional read-only Chronicler orchestration
openclerk capabilities
JSON in, JSON out. See runner help:
openclerk config --help
openclerk document --help
openclerk retrieval --help
openclerk clerk --help
openclerk config inspect_config is the read-only effective config summary for
storage, profile defaults, module summaries, and git lifecycle gate posture.openclerk config owns persisted product/profile preferences such as the
default autonomy profile. openclerk module owns optional provider writes.
Vault-root binding is storage bootstrap config: initialize or intentionally
rebind it with openclerk init --vault-root, not openclerk config.
Request-level document and retrieval autonomy fields override persisted
profile defaults field-by-field. Git checkpoint enablement remains
invocation-scoped through --git-checkpoints or OPENCLERK_GIT_CHECKPOINTS.openclerk clerk run --once is the combined Chronicler MVP report. The same
read-only primitives are also exposed as openclerk clerk inbox_scan for
explicit local inbox candidate planning and openclerk clerk context_pack for
task context, must-read documents, decisions, and citations. All three emitopenclerk-clerk.v1, report planned_no_write: true, and perform no durable
vault writes. Planning that inspects Core evidence requires existing
OpenClerk storage; Chronicler returns a blocker rather than initializing
SQLite from a read-only command.
Storage: ${XDG_DATA_HOME:-~/.local/share}/openclerk/openclerk.sqlite
Override: OPENCLERK_DATABASE_PATH or --db
Architecture
Contributing
See CONTRIBUTING.md, SECURITY.md, and docs/maintainers.md.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found