episteme

A Sovereign Cognitive Kernel that installs a mandatory Thinking Framework (생각의 틀) into every AI coding agent you use. Turns context-blind "average" answers into context-fit protocols, and turns every resolved conflict into permanent, proactively-surfaced know-how.

See it in 60 seconds ↓ · Install ↓ · Why the file-system, not the prompt ↓ · Architecture & philosophy ↓

Episteme — the Thinking Framework in motion

TL;DR

Modern AI agents are incredibly capable — they write production code, navigate entire repos, plan multi-step workflows. What they lack is context-awareness.

When two credible sources disagree — Source A says do it this way, Source B says do it that way — an auto-regressive engine cannot tell which answer fits your project, your team's constraints, your op-class's history. So it defaults to the statistically average answer: fluent, confident, and fit for no specific context.

episteme closes that gap. Before any high-impact command runs, the agent is forced onto a four-field Thinking Framework on disk — Knowns · Unknowns · Assumptions · Disconfirmation — under a Core Question. Every conflict the framework resolves is extracted as a reusable protocol, committed to a tamper-evident knowledge base, and surfaced proactively at the next matching decision.

Enforcement is structural, not advisory. Prompts can be skipped; a file-system hook that exits non-zero cannot.

The ABCD architecture — four blueprints, one cortex

episteme acts as a prefrontal cortex for AI agents: it sits between intent and action, and it refuses to let an action proceed until the reasoning behind it is explicit. Four Cognitive Blueprints — each keyed to a specific failure class — decide what "explicit enough" means for a given op:

A · Axiomatic Judgment — resolves conflicts between credible but incompatible sources. Forces the agent to name why they disagree and which feature of the current context selects between them.
B · Fence Reconstruction — protects inherited constraints. Before a constraint can be removed, its original purpose must be reconstructed — Chesterton's fence enforced by the file system.
C · Consequence Chain — decomposes irreversible ops (first-order effect, second-order effect, failure-mode inversion, base-rate reference, margin of safety).
D · Architectural Cascade — catches refactors and renames that would leave stale references behind. Makes the agent enumerate the full blast radius before it edits.

Every blueprint firing — and every decision it validates — is committed to a tamper-evident hash chain. That chain is not a log; it is how the kernel gives you Active Guidance later: at the next matching decision, the relevant synthesized protocol is surfaced proactively, before the agent defaults to its training distribution.

The result is a project-specific thinking framework that compounds. The agent gets sharper on your codebase every time it resolves a conflict, not because you trained it — because the chain did the remembering.

The problem · the solution

The problem — conflicting sources, averaged answers, no durable know-how

The internet is full of contradictory how-to. Docs say one thing; a senior engineer says another. Two libraries recommend opposite patterns for the same bug. Modern agents, being auto-regressive pattern engines, cannot tell which answer fits this specific context — because fit is a causal-world-model judgment, not a pattern match over token frequency. So they average. The output sounds authoritative, fits no specific context, and misleads by omission.

Prompts cannot fix this:

A system-prompt reminder lives for one call.
A CLAUDE.md nudge gets skipped the moment a deadline arrives.
Know-how — the irreducibly context-specific rule of "in this shape of problem, do this" — cannot be taught through better wording. It has to be extracted, stored, and re-surfaced.

The solution — a Thinking Framework at the file-system level

episteme intercepts the moment intent meets state change. Before any high-impact op (git push, npm publish, terraform apply, DB migrations, lockfile edits), the agent must project its reasoning onto a structured surface on disk:

Field	What the agent must commit to
Core Question	The one question this action is actually trying to answer (counters question substitution).
Knowns	Verified facts, citations, measurements — not plausible-sounding guesses.
Unknowns	Named, classifiable gaps — not vague "there might be risks."
Assumptions	Load-bearing beliefs, flagged so they can be falsified.
Disconfirmation	The observable event that would prove this plan wrong — pre-committed before action.

Validity is checked structurally: minimum content length, no lazy-token placeholders (none, n/a, tbd, 해당 없음), normalized command scanning so bypass shapes like subprocess.run(['git','push']) and os.system('git push') are caught. Agent-written shell scripts are deep-scanned via a stateful interceptor across calls. If the surface is absent or invalid, the op is refused (exit 2). Default is strict; advisory mode (warn-don't-block) is opt-in per-project: touch .episteme/advisory-surface.

This is the difference between a prompt reminder and a compiler: one asks nicely, the other refuses to proceed.

Protocol Synthesis & Active Guidance — the ultimate vision

episteme is not just a blocker. The framework's real job is to turn every conflict it resolves into durable know-how that the agent re-applies automatically at the next matching decision.

Here is the loop (v1.0 RC shipped · CP1–CP10 · 565 / 565 green — see docs/DESIGN_V1_0_SEMANTIC_GOVERNANCE.md):

Detect conflict. The agent encounters two valid-looking but incompatible approaches for a context it hasn't fully resolved before.
Decompose, don't average. The Thinking Framework refuses the "average" answer. It forces the agent to extract why the sources conflict and which feature of the context tips the decision.
Synthesize a context-fit protocol. The resolved "in context X, do Y" rule is committed to an append-only, hash-chained knowledge base — tamper-evident, so the agent cannot silently rewrite the lesson.
Guide actively. At the next matching decision — even weeks later, even across sessions or tools — the kernel surfaces the protocol proactively. You don't have to remember to ask.
Self-maintain. When the agent discovers drift (stale config, deprecated API, core-logic mismatch), it is forced to evaluate patch vs. refactor honestly and synchronize the cascade across the full blast radius — CLI, config, schemas, docs, tests, external surfaces — before moving on.

The knowledge base is not a vector store pretending to be memory. It is a structural, human-readable, version-controlled artifact you can read, edit, fork, and migrate between adapters (Claude Code, Cursor, Hermes, future tools). The kernel outlives the tooling.

I want to… → do this

Goal	Command / pointer
See the Thinking Framework off vs on on the same prompt	`demos/03_differential/` · `scripts/demo_posture.sh`
See what the framework produces end-to-end	`demos/01_attribution-audit/` · `demos/02_debug_slow_endpoint/`
Install as a Claude Code plugin (one line)	`/plugin marketplace add junjslee/episteme`
Install on my machine (CLI + editable kernel)	`pip install -e . && episteme init` — see `INSTALL.md`
Understand what this installs in 3 minutes	`kernel/SUMMARY.md` · `docs/POSTURE.md`
Draft a reasoning surface from a Slack thread	`episteme capture --input thread.txt --output surface.json`
Sync identity to every AI tool I use	`episteme sync`
Encode working style + reasoning posture	`episteme setup . --interactive`
Apply the right harness for my project type	`episteme detect . && episteme harness apply <type> .`
Know when not to use this kernel	`kernel/KERNEL_LIMITS.md`
Find attribution for any borrowed concept	`kernel/REFERENCES.md`
Audit my setup	`episteme doctor`
Read the deeper philosophy (doxa · episteme · praxis · 결)	`docs/NARRATIVE.md`

See it in 60 seconds

Live site + visual dashboard — both rendered against the kernel's own cp7-chained-v1 hash chain. See web/README.md for the Vercel deploy guide.

Three demos, increasing in what they prove:

demos/03_differential/ — the demo that converts skeptics. Exact same prompt, Thinking Framework OFF vs. ON. A PM asks for a 2-sprint semantic-search scope; off answers how; on answers whether. DIFF.md shows which named failure modes the framework caught.
demos/02_debug_slow_endpoint/ — framework applied to a realistic p95 regression. The fluent-wrong "add a cache" answer is rejected at the Core Question gate; a schema-level root cause is produced instead.
demos/01_attribution-audit/ — canonical four-artifact shape (reasoning-surface → decision-trace → verification → handoff). The kernel applied to itself, auditing whether every borrowed concept is traceable to a primary source.

Open any of the three. You will know what episteme produces before reading any philosophy.

Quick start

git clone https://github.com/junjslee/episteme ~/episteme
cd ~/episteme
pip install -e .

episteme init              # generate personal memory files from templates
episteme setup . --write   # score working style + reasoning posture
episteme sync              # push identity to every adapter
episteme doctor            # verify wiring

Project-type harness:

episteme detect .                         # analyze repo, recommend a harness
episteme harness apply ml-research .      # apply it
episteme new-project . --harness auto     # scaffold + auto-detect

Deep-dive onboarding modes, scored dimensions, and defaults: docs/SETUP.md.

How episteme compares

Most tools in this space either build agent runtimes or provide memory APIs for applications. episteme augments the developer tools you already use.

Axis	episteme	Memory APIs (mem0, OpenMemory)	Agent runtimes (Agno, opencode, omo)
What it is	Identity + governance layer across dev tools	Memory API embedded in an app	A runtime that executes agents
Where identity lives	Governed markdown + JSON, cross-tool, versioned	Vector/graph store, per app	System prompt per session
Sync	One command, all tools	N/A	N/A (per-project config)
Know-how extraction	Enforced at file-system boundary; hash-chained	Opaque retrieval	Prompt-tuned, per session

The gap episteme fills: no other project syncs a governed cognitive contract across multiple developer AI tools in one command, and no other project forces context-fit protocol extraction at the point of state mutation. Runtimes and memory APIs own different lanes; episteme sits above them and makes them aware of who you are, how you think, and what your project has already learned.

Zero-trust execution

The OWASP Agentic AI Top 10 identifies prompt injection, goal hijacking, overreach, and unbounded action as the primary risk classes for autonomous agents. The Knowns / Unknowns / Assumptions / Disconfirmation structure is a structural counter to each:

OWASP Agentic Risk	episteme counter
Prompt injection / goal hijacking	Core Question declared before execution begins; deviations surface as Unknowns
Overreach / unbounded action	Constraint regime declared in Frame; reversible-first policy enforced
Fluent hallucination	Unknowns field cannot be blank; assumptions must be named before acting on them
Infinite planning loops	Disconfirmation condition required; loop exits when evidence fires

No assumption is trusted unless named. No action is taken unless the precondition (Knowns) and constraint regime are declared. The kernel is the verification layer between intent and execution.

Human prompt debugging

episteme doesn't just govern the agent — it debugs the human's intent. When the agent maps Knowns vs. Unknowns against a user request, it exposes logical gaps in the original prompt before executing flawed assumptions. The Unknowns field is often where the human realizes their question was underspecified. The Disconfirmation field is often where they realize they haven't thought about falsification at all.

This is not a side effect. It is a design property: a system that forces the agent to declare what it does not know forces the human to confront what they did not specify.

Repository layout

episteme/
├── kernel/                     philosophy (markdown; travels across runtimes)
├── demos/                      end-to-end reference deliverables
├── core/
│   ├── memory/global/          operator memory (gitignored; personal)
│   ├── hooks/                  deterministic safety + workflow hooks
│   ├── harnesses/              per-project-type operating environments
│   └── schemas/                memory + evolution contract schemas
├── adapters/                   kernel delivery layers (Claude Code, Hermes, …)
├── skills/                     reusable operator skills
├── templates/                  project scaffolds, example answer files
├── docs/                       runtime docs, architecture, contracts
├── src/episteme/               CLI + core library
└── tests/

Repo operating contract (for any agent working here): AGENTS.md. LLM sitemap: llms.txt.

CLI surface

episteme init
episteme doctor
episteme sync [--governance-pack minimal|balanced|strict]
episteme new-project [path] --harness auto
episteme detect [path]
episteme harness apply <type> [path]
episteme profile [survey|infer|hybrid] [path] [--write]
episteme cognition [survey|infer|hybrid] [path] [--write]
episteme setup [path] [--interactive] [--write] [--sync] [--doctor]
episteme bridge anthropic-managed --input <events.json> [--dry-run]
episteme bridge substrate [list-adapters|describe|verify|push|pull] ...
episteme capture [--input <file>] [--output <file>] [--by <name>]
episteme viewer [--host 127.0.0.1] [--port 37776]
episteme evolve [run|report|promote|rollback] ...

Full reference: docs/README.md.

Why this architecture

The product is a Thinking Framework; the rest of this list is what falls out when that framework is taken seriously.

Feedforward cognitive control, not reactive correction. Most agent-safety systems observe an error and correct after the fact. episteme names the failure modes before execution and refuses to proceed until they are countered. Knowns, Unknowns, Assumptions, Disconfirmation are declared first, action second.
Cognitive contract (Design by Contract). The Thinking Framework is Bertrand Meyer's Design by Contract applied to reasoning itself: preconditions (Knowns + validated Assumptions that must hold before execution), postconditions (Verification: what must be true at handoff), invariants (kernel principles that cannot be suspended). Breach a precondition and the agent does not proceed.
Hypothesis → test → update, observable across sessions. Each Reasoning Surface carries a hypothesis; each execution carries an outcome; the episodic tier records both; the semantic-promotion job surfaces patterns where hypotheses never fire their declared disconfirmation (calibration debt). Thinking-quality drift is detectable over time.
Cognitive profile is hypothesis, not documentation. The operator profile's nine cognitive-style axes (dominant_lens, noise_signature, explanation_depth, etc.) are control signals that modulate enforcement thresholds — and are themselves audited against the episodic record of actual behavior. Claimed posture vs. lived posture, with drift surfaced as re-elicitation.
Declared limits. KERNEL_LIMITS.md names when the kernel is the wrong tool. A discipline without a boundary is a creed.
Hard authority boundary. Repo docs + global memory are the source of truth; tool-native memories are acceleration, not authority.
Cross-tool consistency. One governed cognitive contract across Claude Code, Hermes, and future adapters. The framework outlives the tool.
Policy engine for agent cognition. episteme plays the role OPA (Open Policy Agent) plays for cloud infrastructure: an independent layer that evaluates whether a proposed reasoning state meets declared policy before the action it authorizes is allowed. The LLM is the runtime; episteme is the policy engine.
AI-safety by construction, not by bolt-on. The same structural gates that counter reasoning failure modes also close the OWASP Agentic risks. Security falls out of the framework.

Memory model, Memory Contract v1, Evolution Contract v1, and managed-runtime coexistence: docs/SYNC_AND_MEMORY.md.

Architecture & philosophy

Prose spine: docs/NARRATIVE.md. Full diagram with node annotations and cross-references: docs/ARCHITECTURE.md.

The Thinking Framework above is the product surface. Beneath it sits a structural vocabulary borrowed from ancient Greek epistemology and Korean aesthetics — a spine that every diagram, demo, and artifact in this repository renders onto.

The triad — doxa · episteme · praxis

Doxa (δόξα) — common opinion, fluent output produced by default. The nine named failure modes in kernel/FAILURE_MODES.md are a taxonomy of doxa mistaking itself for episteme.
Episteme (ἐπιστήμη) — justified knowledge: concrete Knowns, named Unknowns, falsifiable Disconfirmation. The precondition for execution. The repo's namesake.
Praxis (πρᾶξις) — informed action: effects that land with their authorizing discipline intact. The four canonical artifacts (reasoning-surface / decision-trace / verification / handoff) are its visible form.

The grain — 결 · gyeol

The Korean word 결 (gyeol) names the grain of wood or stone: the latent pattern-structure inside matter that, when followed, yields coherent form; when cut against, fractures. The Reasoning Surface's field ordering — Knowns → Unknowns → Assumptions → Disconfirmation — is the 결 of epistemic discipline: settled → open → provisional → falsification-condition. The calibration loop (prediction + outcome joined by correlation_id, analyzed by episteme evolve friction) is the grain refining itself across cycles.

Lifecycle

┌─────────────────────────────────────────────────────────────────────┐
│                         operator (you)                              │
│           ├── cognitive preferences   ├── working style             │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                    episteme sync
                               │
      ┌────────────────────────┼────────────────────────┐
      ▼                        ▼                        ▼
 Claude Code             Hermes (OMO)            future adapter
 (CLAUDE.md)             (OPERATOR.md)           (same kernel)
      │                        │                        │
      └────────────────────────┼────────────────────────┘
                               │
                       per-session loop
                               │
      ┌────────┬────────┬──────┴─────┬────────┬────────┐
      ▼        ▼        ▼            ▼        ▼        ▼
    FRAME → DECOMPOSE → EXECUTE → VERIFY → HANDOFF → (next session)
      │                                        │
      │ Reasoning Surface                      │ docs/PROGRESS.md
      │ (Knowns / Unknowns /                   │ docs/NEXT_STEPS.md
      │  Assumptions / Disconfirmation)        │ decision artifact
      │                                        │
      └────────────── feedback ────────────────┘

Four strata, one loop

graph TD
    subgraph SG1["① The Agentic Mind — Intention"]
        A["Agent\nGenerating intent for a high-impact op"]
        B["Reasoning Surface\ncore_question · knowns · unknowns\nassumptions · disconfirmation"]
        D["Doxa\nFluent hallucination\nnone / n/a / tbd / 해당 없음\n< 15 chars · missing fields"]
        E["Episteme\nJustified true belief\nconcrete knowns · named unknowns\ndisconfirmation ≥ 15 chars · no placeholders"]
    end

    subgraph SG2["② The Sovereign Kernel — Interception"]
        F["Stateful Interceptor\ncore/hooks/reasoning_surface_guard.py\nnormalises cmd · deep-scans agent-written files\ncross-call stateful memory"]
        G["Hard Block · exit 2\nExecution denied\nAgent forced to re-author surface"]
        H["PASS · exit 0\nPrecondition satisfied\nExecution admitted to Praxis"]
    end

    subgraph SG3["③ Praxis & Reality — Execution"]
        I["Tool Execution\ngit push · bash script.sh · npm publish\nterraform apply · DB migrations · lockfile edits"]
        J["Observed Outcome\ncore/hooks/calibration_telemetry.py\nexit_code 0 or non-zero · stderr captured"]
    end

    subgraph SG4["④ 결 · Gyeol — Cognitive Texture & Evolution"]
        K["Prediction Record\ncorrelation_id stamped at PASS\n~/.episteme/telemetry/YYYY-MM-DD-audit.jsonl"]
        L["Outcome Record\ncorrelation_id · exit_code · stderr\n~/.episteme/telemetry/YYYY-MM-DD-audit.jsonl"]
        M["episteme evolve friction\nsrc/episteme/cli.py · _evolve_friction\npairs prediction ↔ outcome by correlation_id\nranks under-named unknowns · flags exit_code ≠ 0"]
        N["결 · Gyeol\nRefined cognitive grain\nfriction hotspots · calibrated profile axes"]
        O["Operator Profile\ncore/memory/global/operator_profile.md\nlast_elicited axes updated · confidence rescored"]
        P["kernel/CONSTITUTION.md\nFour principles recalibrated\nfailure-mode counters sharpened"]
    end

    A --> B
    B --> D
    B --> E
    D --> F
    E --> F
    F --> G
    F --> H
    G -.->|"cognitive retry"| A
    H --> I
    I --> J
    E -.->|"correlation_id stamped at PASS"| K
    J --> L
    K --> M
    L --> M
    M --> N
    N --> O
    N --> P
    O -.->|"posture loop closed"| A
    P -.->|"posture loop closed"| A

    classDef doxaStyle fill:#c0392b,stroke:#922b21,color:#fff
    classDef episteStyle fill:#1e8449,stroke:#145a32,color:#fff
    classDef passStyle fill:#27ae60,stroke:#1e8449,color:#fff
    classDef praxisStyle fill:#2ecc71,stroke:#27ae60,color:#000
    classDef gyeolStyle fill:#1a5276,stroke:#154360,color:#fff
    classDef kernelStyle fill:#6c3483,stroke:#512e5f,color:#fff
    classDef neutralStyle fill:#2c3e50,stroke:#1a252f,color:#fff

    class D,G doxaStyle
    class E episteStyle
    class H,I passStyle
    class J praxisStyle
    class K,L,M,N,O,P gyeolStyle
    class F kernelStyle
    class A,B neutralStyle

Four subgraphs, one lifecycle. Doxa (red) — fluent-but-unvalidated output or a hard block — is the failure state the kernel exists to prevent. Episteme (green) — a validated Reasoning Surface — is the precondition for execution. Praxis (light green) — the admitted tool execution and its observed outcome. 결 · Gyeol (blue) — the calibration loop that refines the framework across cycles, feeding back into the operator profile and the kernel constitution.

Works with any stack. episteme operates independently of the LLM runtime — LangChain, CrewAI, Claude Code, Cursor, MCP. Kernel is pure markdown; operator profile is plain JSON; workflow loop is vendor-neutral. Adapter layer (Claude Code, Hermes, OMO/OMX) is pluggable.

The kernel files

Start at kernel/. Pure markdown. No code. No vendor lock-in.

File	What it defines
`SUMMARY.md`	30-line operational distillation
`CONSTITUTION.md`	Root claim, four principles, nine failure modes
`REASONING_SURFACE.md`	Knowns / Unknowns / Assumptions / Disconfirmation protocol
`FAILURE_MODES.md`	Nine fluent-agent failure modes ↔ counter artifacts (6 Kahneman · 3 governance)
`OPERATOR_PROFILE_SCHEMA.md`	Schema for encoding an operator's cognitive preferences
`MEMORY_ARCHITECTURE.md`	Five memory tiers (working / episodic / semantic / procedural / reflective)
`KERNEL_LIMITS.md`	When the kernel is the wrong tool; declared gaps
`REFERENCES.md`	Attribution for every load-bearing borrowed concept
`CHANGELOG.md`	Versioned kernel history

Authority hierarchy: project docs > operator profile > kernel defaults > runtime defaults. Specific beats general.

Push-readiness checklist

PYTHONPATH=. pytest -q tests/test_profile_cognition.py
python3 -m py_compile src/episteme/cli.py
episteme doctor
git status && git rev-list --left-right --count @{u}...HEAD

Topic	Where
What `episteme` installs (posture framing)	`docs/POSTURE.md`
The v1.0 RC direction	`docs/DESIGN_V1_0_SEMANTIC_GOVERNANCE.md`
Kernel distillation (30 lines)	`kernel/SUMMARY.md`
What the kernel produces	`demos/01_attribution-audit/` · `demos/02_debug_slow_endpoint/`
Same prompt, framework off vs. on	`demos/03_differential/`
Install paths (marketplace, CLI, dev)	`INSTALL.md`
Benchmark with disconfirmation target	`benchmarks/kernel_v1/`
Substrate bridge (mem0, memori, noop)	`docs/SUBSTRATE_BRIDGE.md`
Profile + cognition setup	`docs/SETUP.md`
Sync matrix, memory model, contracts	`docs/SYNC_AND_MEMORY.md`
Harness system	`docs/HARNESSES.md`
Hook reference + governance packs	`docs/HOOKS.md`
Skills + agent personas + provenance	`docs/SKILLS_AND_PERSONAS.md`
Personal customization (memory/hooks/skills)	`docs/CUSTOMIZATION.md`
Agent repo operating contract	`AGENTS.md`
Architecture deep-dive	`docs/EPISTEME_ARCHITECTURE.md`
Cognitive system playbook	`docs/COGNITIVE_SYSTEM_PLAYBOOK.md`

episteme

episteme

TL;DR

The ABCD architecture — four blueprints, one cortex

The problem · the solution

The problem — conflicting sources, averaged answers, no durable know-how

The solution — a Thinking Framework at the file-system level

Protocol Synthesis & Active Guidance — the ultimate vision

I want to… → do this

See it in 60 seconds

Quick start

How episteme compares

Zero-trust execution

Human prompt debugging

Repository layout

CLI surface

Why this architecture

Architecture & philosophy

The triad — doxa · episteme · praxis

The grain — 결 · gyeol

Lifecycle

Four strata, one loop

The kernel files

Read next

Push-readiness checklist

Reviews (0)