Name: odylith
Author: odylith

Odylith Stops Coding Agents From Confidently Doing The Wrong Thing

It makes coding agents operate like disciplined engineers instead of clever tourists.

[!IMPORTANT]
Odylith is not a standalone app or IDE. Install it into a repo, then use it
through Codex. Current public support is Codex only. In Odylith, Codex is
the execution interface and odylith/index.html is the operating surface
that keeps intent, constraints, topology, and execution state visible.

Odylith is GA on its supported public install platforms as of 2026-04-07.

Quick Start

Install Odylith from the Git-backed repository you want to augment:

curl -fsSL https://odylith.ai/install.sh | bash

Run it from the repo root when you can; Odylith can also detect that root from
any subdirectory inside the same repo. The current GA platform contract covers
macOS (Apple Silicon) and Linux (x86_64, ARM64). Intel macOS and Windows
are not part of the current GA platform set.

After install, open the repo in Codex. For first-run behavior,
the first prompt to use, browser-shell behavior, and repo-root selection or
reduced-mode details, see First Run In An Odylith Repo.
For more example prompts, see
Starter Prompt Inspirations.

Prove It In 2 Minutes

From the repo you just installed into, run:

./.odylith/bin/odylith start --repo-root .

Then paste this into Codex:

Use Odylith to define this repo's first governed slice. Pick one path to own, one seam to guard, one component to define, one diagram to draw, and one backlog to open, all tied to the same slice. First show me 5 bullets. Then create the Odylith files. Plain English. Real file paths only. No IDs. No hedging. Only write under odylith/.

What you should see within the first turn:

a narrowed, grounded slice instead of a blind repo scan
odylith/index.html reflecting the repo's live execution state
durable repo-local records under odylith/ for the chosen slice

[!TIP]
⭐ If Odylith makes your coding agent materially sharper in real repo work,
star the repo so other operators can find it.

What Does The Name "Odylith" Mean?

Odylith combines "Ody," suggesting a journey, with "lith," from the Greek
lithos, meaning stone. The result is a name that suggests movement guided by
permanence: exploration anchored by a stable core. It reflects the idea at the
heart of the product: motion with a center, exploration with structure, and a
path toward agentic AI swarms that replace rigid monoliths with adaptive,
living networks.

Intro

Odylith changes the operating conditions for Codex.

It replaces blind repo search with scoped grounding.
It gives the agent durable repo-local memory and a forensic trail.
It governs validation, diagnosis, recovery, and closeout.

Base coding agents can read a repo, search files, sketch a plan, write code,
and infer some local context from the code itself. But serious work depends on
intent, constraints, ownership, validation obligations, and definition of done
that are not reliably encoded in code alone.

With Odylith, that execution truth becomes explicit and durable in the
repository, so the agent starts from governed context instead of
reconstructing it from scratch on every turn.

Turn Requests Into Execution Truth

Odylith gives coding agents two durable advantages: delivery intelligence
and delivery governance.

Delivery intelligence recovers intent, constraints, dependencies, topology,
and validation requirements from the repository's real operating history.

Delivery governance turns that into execution truth: the right slice, the
right owner, the blockers, and the real definition of done.

That is the real value: less time re-deriving the repository, more time making
the right change.

More on the operating frame:
Why Bolting Odylith Onto Codex Changes The Outcome

Tribunal

One of Odylith's core strengths is that it can take one blocked or ambiguous repo posture, run ten specialist actors over the same grounded evidence, and force an adjudicated case before the agent acts. Tribunal is the engine for that step. It is not the first-turn grounding path. It runs in higher-level delivery-intelligence flows such as odylith sync, governed surface refresh, and evaluation or benchmark paths when Odylith needs to explain a live blocker, conflict, failure, or ambiguous posture in a workstream, component, or diagram.

Tribunal diagnosis flow from live actionable scope to grounded dossier, actor review, adjudicated case, Remediator packet, and Odylith surfaces

It builds a grounded case file for the blocked scope.
It runs specialist review and adjudicates one explicit read of the problem.
It hands bounded remediation forward with validation and rollback guards.

More on Tribunal and the product control plane:
Odylith Product Components

Surface Tour

Captured from the local Odylith shell in this repository. The screenshots below
were refreshed on 2026-04-05. Click any screengrab to open the full-size
image.

All of the views below are the canonical odylith/index.html shell with a
specific surface tab active, because that is the actual operator experience
Odylith ships.

Radar

The example below shows workstream B-040 inside the Radar shell.

Ranked backlog: the left rail is the active delivery queue, grouped by
execution state so the agent sees what is moving, parked, or already done.
Selected workstream detail: the right pane turns one workstream into
execution truth with score, dates, confidence, traceability, and linked
specs or plans.
Delivery controls: the search and filter bar lets you narrow by section,
phase, activity, lane, priority, and sort order without leaving the shell.

Compass

The example below shows the live global Compass brief in the 48h window.

Standup brief: the left column summarizes what changed, what matters
now, and what the current execution slice is trying to achieve.
Audit timeline: the right column is the timeline audit, showing
timestamped execution evidence for the selected audit day.
Scope and time controls: the top pills switch between 24h and 48h
windows, set the audit day, and move between global and workstream-scoped
views.

Atlas

The example below shows diagram D-017 inside the Atlas shell.

Diagram catalog: the left rail is the searchable Atlas index, with
filters for kind, workstream, and freshness.
Connected workstream context: the header binds each diagram to owners,
active touches, and historical references so topology stays grounded in live
delivery.
Diagram viewer: the center pane is the zoomable diagram itself, with
controls to pan, fit, export, and inspect the architecture without leaving
the shell.

Registry

The example below shows the Tribunal component dossier inside the Registry shell.

Component inventory: the left column is the curated component list, which
gives the agent a governed map of what exists.
Component dossier: the main panel explains what a component is, why it is
tracked, what spec or topology is attached, and which forensic evidence
supports it.
Change chronology: the lower forensic stream is the audit trail for that
component, so history and evidence stay attached to the current spec.

Casebook

The example below shows case CB-009 inside the Casebook shell.

Bug case queue: the left column is the searchable case list, with
severity and status filters to separate active incidents from resolved
learnings.
Selected bug detail: the main pane turns one failure into a reusable
dossier with description, failure signature, detection path, ownership, and
fix history.
Prevention memory: the lower sections keep the root cause, verification,
rollback, and regression tests visible so the same bug is less likely to
return.

Benchmarks

Odylith publishes two benchmark views and keeps their claims separate:

Grounding Benchmark (--profile diagnostic): measures how well Odylith
builds the right grounded context before the live agent run
Live Benchmark (--profile proof): measures how well Odylith completes
the real task end to end against raw Codex CLI

In README framing, odylith_off is the raw Codex CLI lane.

Current public proof posture is local-first memory on LanceDB plus Tantivy.
These are first public eval runs and should be read as a baseline, not a
ceiling. Odylith wins by grounding and operationalizing shared repo truth
better, not by hiding truth from the baseline lane.

Grounding Benchmark

[!NOTE]
The Grounding Benchmark (--profile diagnostic) is not the product claim.
It isolates packet and prompt construction quality before any live Codex
session begins.

The Grounding Benchmark answers:

"Does Odylith build a better grounded packet/prompt than odylith_off?"
"What is the prep-time and prompt-size cost of Odylith’s retrieval/memory layer?"
"Does Odylith improve required-path coverage before the model starts working?"

Grounding benchmark snapshot:
Current Grounding Benchmark Snapshot

Grounding benchmark tables:
Benchmark Tables

Grounding Graphs

Headline win: Odylith starts the model with materially better grounding:
+0.320 required-path recall and +0.690 validation-success proxy versus
odylith_off.

On the warm-cache diagnostic lane, odylith_on beat odylith_off across 37
seeded packet and prompt scenarios with:

+0.320 required-path recall
+0.084 required-path precision
+0.690 validation-success proxy
+7.048 ms median wall clock (9.881 ms p95, 254.219 ms total across all 37 pairs)

The family heatmap uses the linked developer-first family order rather than raw
token cost. The grounding quality frontier credits prompt-visible repo paths on
the raw control lane, and the operating-posture view comes from the sampled
adoption_proof slice.

Odylith grounding benchmark family heatmap

Odylith grounding benchmark quality frontier

Odylith grounding benchmark frontier

Odylith grounding benchmark operating posture

Live Benchmark

[!TIP]
The Live Benchmark (--profile proof) is the product-claim lane. Current
full-proof status: provisional_pass.

The Live Benchmark answers:

"Does Odylith beat raw Codex CLI on the same live end-to-end task contract?"
"What is the full matched-pair time to valid outcome?"
"Does Odylith improve required-path coverage, validation, and expectation success on the live run?"

Live benchmark snapshot:
Current Live Benchmark Snapshot

Live benchmark tables:
Benchmark Tables

Live Graphs

Headline win: Odylith reaches valid outcomes faster and with far less
model spend: -12.43s median time to valid outcome and -52,561 median
live-session input tokens versus odylith_off.

On the conservative published proof view, odylith_on beat odylith_off
across 37 seeded scenarios with:

-12.43s median time to valid outcome
-52,561 median live-session input tokens
+0.227 required-path recall
+0.168 required-path precision
+0.393 expectation success

This published view keeps the scenario-wise worst-of-warm/cold result for each
scenario, drawn from 74 matched pairs (148 total live results), so the
headline stays conservative rather than cherry-picked.

The family heatmap uses the linked developer-first family order rather than
prompt-token cost.

Odylith live benchmark family heatmap

Odylith live benchmark quality frontier

Odylith live benchmark frontier

Odylith live benchmark operating posture

Need help reading the graphs, reports, and artifacts? See
How To Read Odylith's Codex Benchmarks.

Best Fit Use Cases

Odylith is strongest when:

the work spans multiple files, contracts, or governance surfaces
the repo is large enough that boundaries, ownership, bug history, and
execution state matter
you want specs, plans, component inventory, diagrams, and bug history to
live beside the code instead of across separate SaaS tools
you want recent execution and decisions visible in Compass instead of buried
in terminal history

Odylith is not meant to replace direct file reads for tiny obvious edits. It is
most useful when the repo is large enough that repo memory, topology, workstream
state, and execution history start to matter.

Odylith Governs Itself

This repo also uses Odylith on itself.

Surface	Product-Owned Truth
Radar	`odylith/radar/`
Atlas	`odylith/atlas/`
Compass	`odylith/compass/`
Registry	`odylith/registry/`
Casebook	`odylith/casebook/`

odylith