Proofrail

Proofrail is a runtime harness plugin for Hermes that helps AI agents work more like careful engineers.

If you're searching for a Codex harness, a Claude Code harness, or a general agent harness for Hermes, Proofrail is built for that job: it wraps real tool execution with evidence-first edits, verify-after-mutation workflow, high-risk command handling, and long-task context anchors.

It adds a repeatable execution process around tool use:

gather evidence before editing existing files or mutating local state
validate after every change before the next mutation
track session workflow state with Observe / Execute / Review
detect and handle high-risk commands with extra scrutiny
summarize oversized tool output before it pollutes model context
preserve task anchors so long runs lose less state

The goal is simple: fewer blind edits, fewer unverified claims, and more reliable agent execution inside Hermes.

Why this exists

The same model can feel very different in different agent runtimes.

In one tool it behaves like a chatbot. In another it starts acting more like an engineer: it inspects first, gathers evidence before changing anything, validates after edits, and corrects itself when something fails.

Proofrail focuses on that execution layer.

It does not try to replace the model. It changes how the agent works during real tool use: how it observes, how it executes, how it validates, and how it self-corrects.

For Hermes, that means a runtime plugin that can:

block existing-file edits when there is no nearby evidence
require validation before the next mutation
track whether the session is in Observe, Execute, or Review
flag and handle high-risk command patterns
reduce context pollution with large-output summarization
push the agent toward verification and self-correction

Current status

Version: v0.0.4
Host: Hermes Agent plugin hooks
Language: Python

Version note: the GitHub release/tag line is v0.0.4, while the Python package and wheel version is 0.0.4 to follow PEP 440. They refer to the same release.

The current main branch is the cooperative-runtime v0.0.4 line: explicit forced modes, classifier fallback/mode mapping, mode-transition audit, and forward-progress reopen signaling are now part of the release baseline.

Quick start

Install the unpacked plugin directory into:

$HERMES_HOME/plugins/proofrail/

Then enable it in the target instance's config.yaml:

plugins:
  enabled:
    - proofrail

If the target instance already has other plugins, append proofrail to the existing list instead of replacing the whole array.

What it does at runtime

evidence before mutation — inspect first, then edit
verify after mutation — validate changes before continuing
cooperative forced modes — enter explicit gather_target_evidence, validate_only, change_strategy, or user_choice submodes with allowed / forbidden next actions
mode-specific task handoffs — inject collaboration-framed task panels so the next legal move feels like progress instead of punishment
low-signal probe blocking — stop repeated no-progress probing loops
gray-area classifier fallback + mode mapping — when structured output is unsupported, fall back to rule-based classification; when the classifier does intervene, map it into cooperative runtime modes
dangerous command audit — detect high-risk commands and surface them back into reasoning context
large output summarization — compress oversized tool output before reinjection
session-scoped workflow state — maintain Observe / Execute / Review phase per session
audit trail — JSONL audit events for preflight, mutation, validation, forced-mode transitions, forward-progress reopen events, dangerous commands, and summarization
task ledger — session-level record of evidence, mutations, validation, touched files, and final state
validation suggestions — inject the narrowest plausible verification hints from touched files and command shape

Current runtime rules

Existing files cannot be modified without evidence
After a mutation, the next mutation must wait until validation runs
After repeated low-signal probes, the same no-progress loop is blocked
Dangerous terminal commands default to warn/audit, not a manual approval loop
- approve is currently fail-closed in Hermes: it blocks and tells the operator to confirm manually, then retry.
Large tool output is summarized through transform_tool_result
pre_llm_call injects phase-aware runtime context
After changes, the plugin injects touched files, validation hints, and final evidence-report requirements
Hard workflow blocks enter cooperative modes, not just warning prose
Classifier interventions can route the runtime into change_strategy or user_choice instead of leaving the model to guess the shortest legal next step
Successful validation explicitly reopens forward progress and emits a semantic audit event

Configuration

The default configuration is usable as-is and optimized for autonomous execution: dangerous commands default to warn, meaning they stay in autonomous mode with audit + follow-up verification expectations, but they are still subject to the same evidence-before-mutation and verify-after-mutation guardrails as any other mutating command.

If your Hermes build exposes plugins.entries, you can override settings like this:

plugins:
  enabled:
    - proofrail
  entries:
    proofrail:
      dangerous_command_action: warn
      summary_threshold_chars: 8000
      low_signal_block_threshold: 2
      audit_enabled: true
      audit_log_path: .proofrail/audit.jsonl
      llm_classifier_enabled: true
      # Leave provider/model unset to inherit the instance's current main model.
      llm_classifier_provider: null
      llm_classifier_model: null
      tool_aliases:
        shell: exec
        run_command: exec
        edit_file: write
        apply_patch: write

llm_classifier_enabled: true turns on the gray-area classifier path.

Leave llm_classifier_provider and llm_classifier_model unset (or null) if you want the classifier to follow the instance's current main model automatically.
Set both fields if you want the classifier to use a dedicated provider/model pair.

Supported tool categories are: read, write, exec, search, network, and other. See docs/configuration.md for details.

Some built-in dangerous-command patterns include common infrastructure/network protection cases such as Tailscale stop/down/logout commands. These are opinionated defaults, not a claim that every deployment uses Tailscale.

Testing and release hygiene

Core regression coverage currently includes:

hook registration
dangerous command detection in warn/audit mode
evidence-before-mutation blocking for existing files
new-file creation allowance
conservative patch mutation handling
verification-before-next-mutation enforcement
low-signal probe blocking
large-output summarization
phase-aware pre_llm_call injection
explicit system-added / non-user provenance markers for injected plugin context
session end/finalize cleanup
audit log writing
touched-file and validation-hint injection
task ledger lifecycle
readback validation that clears pending_verification when the touched target is directly re-read
blocked-tool-call feedback reinjected into later reasoning context
summarize branding regression
cooperative forced modes with allowed / forbidden action menus
classifier structured-output fallback to rule-based gray-area review
classifier-to-mode mapping for change_strategy and user_choice
mode-specific collaboration handoff wording
mode lifecycle audit for validate_only entry / clear
block-driven mode-transition audit for missing_evidence and low_signal_repeat
forward_progress_reopened semantics after successful validation
end-to-end behavior simulation and local self-smoke of the cooperative runtime path

Run the local verification lane with:

pytest -q \
  tests/test_proofrail.py \
  tests/test_readback_validation_regression.py \
  tests/test_cooperative_modes.py \
  tests/test_classifier_fallback.py \
  tests/test_classifier_mode_mapping.py \
  tests/test_phase4_audit_and_wording.py \
  tests/test_phase5_mode_lifecycle.py \
  tests/test_phase6_behavior_simulation.py
rm -rf __pycache__ proofrail/__pycache__ tests/__pycache__ scripts/__pycache__ .pytest_cache
python3 scripts/check.release.py
python3 -m build --wheel
python3 scripts/verify.package.py
PYTHONPATH=. python3 scripts/phase6.live.smoke.py
rm -rf __pycache__ proofrail/__pycache__ tests/__pycache__ scripts/__pycache__ build dist *.egg-info .pytest_cache
python3 scripts/check.release.py

Security and current limits

Proofrail is a workflow harness, not an OS sandbox.

Current boundaries:

terminal mutation detection is still best-effort heuristic logic, not a full shell parser
dangerous command detection is pattern-based, not semantic shell interpretation
audit logs may contain command text, paths, tool arguments, and short output previews
wheel build success does not mean Hermes installs the plugin directly from wheel; the primary install shape is still the unpacked plugin directory

Open-source positioning

This repository is intended to be:

a Hermes-native runtime harness plugin for autonomous coding agents
a public example of evidence / mutation / validation workflow discipline in a Python plugin
a practical starting point for people looking for a Hermes agent harness, Codex harness, or Claude Code harness style execution layer

proofrail-hermes