hermescheck

agent
Security Audit
Pass
Health Pass
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 35 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose

This tool is a companion scanner for the Hermes Agent ecosystem. It analyzes an AI agent's codebase to produce structured architecture reports, identifying potential long-running failure modes such as memory drift, tool-boundary gaps, and runtime sprawl.

Security Assessment

Overall Risk: Low

Based on the automated code scan of 12 files, no dangerous patterns or hardcoded secrets were detected. The tool does not request any inherently dangerous permissions during its operation. While the description implies it acts as a localized scanner, users should remain aware that code-analysis tools inherently need to read repository files. Since it relies on a Python runtime, standard local security precautions apply, but no active exploitation vectors or external network risks were identified.

Quality Assessment

The project demonstrates strong health and active maintenance, with its most recent push occurring just today. It is legally clear for integration and modification under the permissive MIT License. The repository has garnered 35 GitHub stars, indicating a fair degree of early community trust and practical interest. The documentation is professional, clearly outlining the tool's objectives, workflow, and commitment to staying aligned with future Hermes Agent updates.

Verdict

Safe to use.
SUMMARY

Hermes Agent-focused architecture and runtime health checks

README.md

HermesCheck - Hermes Agent-focused architecture and runtime health checks

hermescheck

Architecture audit cards for AI agent runtimes.

hermescheck scans an AI agent checkout and produces a structured architecture
report about memory drift, tool-boundary gaps, runtime sprawl, gateway risks,
scheduler behavior, observability, and other long-running agent failure modes.
It can also render the result as a clean PNG report card for README images,
release posts, and maintainer-friendly architecture snapshots.

Hermes Agent remains the first deep adaptation target. hermescheck is a
community companion tool for
NousResearch/hermes-agent, and
its strongest checks are shaped around Hermes forks, deployments, and review
workflows.

This project is not an official Nous Research project. It is built for the
Hermes Agent community and derived from the general-purpose
agchk scanner, then narrowed for
Hermes-specific review workflows.

Long-term commitment: hermescheck is designed to stay in deep alignment with
Hermes Agent. It will be maintained release by release, updating checks,
documentation, and regression coverage so every Hermes release can ship with a
clear community health-check path for forks and deployments. It will also help
Hermes Agent reach, support, and earn practical adoption among Chinese
developer communities.

CI PyPI License All Contributors

Release notes

HermesCheck report card showing architecture score, priority findings, runtime signals, and production audit scope

Why It Exists

Hermes Agent is more than a chat CLI. It is a persistent agent runtime with a
conversation loop, tool registry, skills, memory, session search, messaging
gateway, scheduled automations, terminal backends, plugins, and training
surfaces. That power is exactly why Hermes forks and deployments can drift in
ways ordinary linters do not catch.

hermescheck asks Hermes-shaped questions:

  • Does this checkout still contain the core Hermes runtime surfaces?
  • Are slash commands derived from the central registry instead of diverging per surface?
  • Do CLI, TUI, gateway, skills, cron, and SessionDB still line up?
  • Can interrupted runs resume from transcript plus durable environment state?
  • Are tool/syscall boundaries explicit enough for high-agency operation?
  • Is memory becoming a durable subsystem rather than context stuffing?
  • Are startup paths, plugins, and background jobs becoming hard to reason about?
  • Can findings be exported to Markdown, JSON, and SARIF for repeatable review?

Full-Score Agent Architecture

In hermescheck terms, a full-score Hermes-aligned agent is not just a model
with tools. It is a stateful agent operating system: every user-facing surface
shares one command contract, every tool crosses an explicit capability boundary,
memory is paged and recoverable, and each release can be checked through a
repeatable evidence pipeline.

HermesCheck full-score agent architecture: command contract, stateful recovery, memory and skill OS, tool syscall boundary, scheduler, and release guardrail

The architecture should provide these capabilities:

  • one canonical command surface across CLI, TUI, gateway, help, autocomplete, and menus
  • stateful recovery from transcript plus real environment state
  • external LLM CLI workers through Task JSON, natural-language prompt handoff, stdout/stderr/exit-code capture, and process controls
  • explicit tool/syscall capabilities before high-agency execution
  • memory that supports facts, skills, semantic anchors, paging, and page-fault recovery
  • scheduler controls for long-running jobs, cron, gateway events, and user-visible tasks
  • observability that turns every release check into reusable evidence

Quick Start

pip install hermescheck

Try it on any agent repository:

hermescheck ./path/to/agent-repo \
  --profile personal \
  -o audit_results.json \
  -r audit_report.md

Scan a Hermes Agent checkout:

git clone https://github.com/NousResearch/hermes-agent.git
hermescheck ./hermes-agent

Write machine-readable, human-readable, and GitHub code-scanning outputs:

hermescheck ./hermes-agent \
  --profile personal \
  -o audit_results.json \
  -r audit_report.md \
  --sarif hermescheck.sarif.json

Run as a module from a local clone:

python -m hermescheck ./path/to/hermes-agent --quiet

Render a shareable report card:

pip install "hermescheck[card]"
hermescheck card audit_results.json -o audit_card.png

Example Report Snapshot

hermescheck is designed to produce a first-screen summary that maintainers
can understand immediately, then drill into through Markdown, JSON, or SARIF.

Chinese:

结果摘要:
- Overall Health: unstable
- Architecture Era: 内燃气时代 (75/100)
- 总问题数: 108
- HIGH: 5
- MEDIUM: 88
- LOW: 15

最主要的 5 个高优先级问题:
1. Internal orchestration sprawl detected
   - 编排/规划/路由/恢复/调度层过多,主循环职责不够单一
2. Memory freshness / generation confusion detected
   - 记忆面过多,存在“哪个是最新 authoritative memory”的歧义
3. Role-play handoff orchestration detected
   - 角色化/部门化 handoff 偏多,容易造成上下文漂移
4. Startup surface sprawl detected
   - 启动入口和 wrapper 较多,启动链路不够收敛
5. Runtime surface sprawl detected
   - runtime 面太多 (agent_stack / ops / queue / storage / ui / web_api),理解和维护成本高

English:

Report summary:
- Overall Health: unstable
- Architecture Era: Combustion Age (75/100)
- Total Issues: 108
- HIGH: 5
- MEDIUM: 88
- LOW: 15

Top 5 high-priority issues:
1. Internal orchestration sprawl detected
   - Too many planning, routing, recovery, and scheduling layers; main-loop ownership is not clear enough.
2. Memory freshness / generation confusion detected
   - Too many memory surfaces; unclear which one is the latest authoritative memory.
3. Role-play handoff orchestration detected
   - Too many department-style handoffs; context can drift between roles.
4. Startup surface sprawl detected
   - Too many entrypoints and wrappers; the startup chain is not convergent enough.
5. Runtime surface sprawl detected
   - Runtime spans too many surfaces (agent_stack / ops / queue / storage / ui / web_api), raising comprehension and maintenance cost.

Hermes-Specific Checks

Runtime Contract

hermescheck first detects whether the target looks like a Hermes Agent
checkout. If it does, it verifies the presence of core runtime surfaces:

Surface Expected path
Agent loop run_agent.py
Tool orchestration model_tools.py, toolsets.py, tools/registry.py
CLI cli.py, hermes_cli/commands.py
Session memory hermes_state.py
Profile-aware paths and logs hermes_constants.py, hermes_logging.py
Skills skills/, optional-skills/, agent/skill_commands.py
Gateway gateway/run.py, gateway/platforms/
Scheduling cron/scheduler.py
Execution environments tools/environments/
Plugins plugins/

If a fork or packaging step drops one of these surfaces, the report makes the
drift visible before the missing piece becomes a runtime surprise.

hermescheck intentionally treats test suites, fixtures, specs, and coverage
artifacts as out of scope for target audits. Maintainers can use tests as their
own proof when closing an issue, but the scanner should focus findings on
production/runtime architecture and source paths.

Slash Command Contract

Hermes shares slash commands across the classic CLI, TUI, messaging gateway,
help text, autocomplete, and platform menus. hermescheck looks for the shared
COMMAND_REGISTRY, GATEWAY_KNOWN_COMMANDS, resolve_command, and
gateway_help_lines helpers so command changes do not silently split by
surface.

General Agent Architecture Signals

The Hermes-specific scanner runs alongside inherited architecture checks:

  • internal orchestration sprawl
  • completion-closure gaps
  • static bug inference from code patterns
  • token usage budget risks, including large default context windows and full-history prompt assembly
  • memory freshness confusion
  • memory lifecycle governance and CJK-safe retrieval paths
  • RAG retrieval governance and context-budget controls
  • self-evolution capability: external signals, source reading, pattern extraction, constraint adaptation, safe landing, verification closure, hands-on validation, and reusable assetization
  • impression/pointer memory gaps
  • role-play handoff chains
  • agent-OS architecture gaps, including Stateful Agent recovery
  • daemon lifecycle controls, self-restart control-plane hazards, post-restart recent-session recall, capability policies, plugin sandboxing, remote tool boundaries, and pipeline middleware integrity
  • LLM CLI worker contract gaps for Qwen/Codex/Claude-style process delegation, including raw-JSON stdin handoff
  • duplicated skills and SOPs
  • startup and runtime surface sprawl
  • hidden LLM calls
  • tool-enforcement gaps
  • output pipeline mutation
  • code execution risks
  • missing observability, missing before/after evidence capture, and missing handoff/workbook habits
  • excessive agency controls in enterprise mode

Profiles

hermescheck keeps two practical profiles:

Profile Intended use Behavior
personal Local Hermes forks, experiments, solo operator setups Prioritizes internal drag, closure, memory shape, and runtime clarity
enterprise Team-owned or production Hermes deployments Keeps stricter checks for secrets, code execution, approvals, and observability

Examples:

hermescheck ./hermes-agent --profile personal
hermescheck ./hermes-agent --profile enterprise --fail-on high

Report Shape

Every scan produces:

  • schema_version: stable JSON schema identifier
  • scan_metadata: timestamp, duration, scanner count, profile
  • executive_verdict: health, primary failure mode, urgent fix
  • scope: entry points, channels, model stack, audited layers
  • maturity_score: architecture-era score, formula, positive signal ledger, penalty ledger, score caps, and share line
  • evidence_pack: compact evidence references
  • findings: severity-ranked issues with fixes
  • conflict_map: target-agent self-review of conflicting, duplicated, or contradictory architecture links
  • ordered_fix_plan: practical next steps

Generate Markdown from a previous JSON report:

hermescheck report audit_results.json -o audit_report.md

Render a clean PNG report card for README, release, or social sharing:

pip install "hermescheck[card]"
hermescheck card audit_results.json -o audit_card.png

Validate a report:

hermescheck validate audit_results.json

Use With Hermes PRs

For contributors preparing a Hermes Agent PR:

hermescheck ./hermes-agent --profile personal -o audit_results.json -r audit_report.md

Then use the report to answer:

  • Did the change touch the agent loop, command registry, gateway, skills, cron, or SessionDB?
  • Did any interface work in CLI but not gateway, or vice versa?
  • Did a new tool path get a capability boundary and observable failure mode?
  • Did a memory or skill change preserve recall, search, and closure behavior?
  • Can an interrupted run verify environment state before repeating tool work?
  • Can the PR description cite a concrete validation command?

The goal is not to block Hermes experimentation. The goal is to make drift
visible early so community tools, forks, and upstream contributions stay easy to
review.

Development

git clone https://github.com/huangrichao2020/hermescheck.git
cd hermescheck
python -m pip install -e ".[dev]"
pytest -q
ruff check hermescheck tests
ruff format --check hermescheck tests

The CI pipeline runs lint, repository hygiene checks, tests across supported
Python versions, a self-scan, and package build validation.

Contributing

Useful contributions include:

  • sharper Hermes-specific contract checks
  • false-positive reductions from real Hermes forks
  • report examples from public-safe scans
  • SARIF or CI integration improvements
  • docs that make Hermes review workflows easier to repeat

See:

Contributors

Thanks goes to these people for code, docs, ideas, tests, reviews, examples, and
real-world self-scan lessons.

Huang richao
Huang richao

Code Docs Ideas Maintenance

License

MIT. See LICENSE.

Reviews (0)

No results found