ironbee-cli
IronBee CLI — Verification and Intelligence Layer for Agentic Development

IronBee CLI
The CLI for IronBee — Verification and Intelligence Layer for Agentic Development
IronBee ensures that AI agents verify their code changes before completing a task. When an agent edits code, it cannot finish until it navigates to the affected pages, functionally tests the changes, and submits a passing verdict.
No more "it should work" — every change is tested.
IronBee also tracks every verification cycle — coding time, fix time, pass/fail rates, problematic files — and provides session and project-level analytics for LLM-powered semantic insights.
Powered by browser-devtools-mcp — the agent navigates pages, clicks buttons, fills forms, takes screenshots, checks console errors, and writes a structured verdict.
Demo
https://github.com/user-attachments/assets/9d4e602b-6c05-4b48-89a8-3df429d10e00
Supported Clients
| Client | Status |
|---|---|
| Claude Code | Supported |
| Cursor | Supported |
| Codex | Planned |
| OpenCode | Planned |
Quick Start
Install IronBee globally
npm install -g @ironbee-ai/cli
Set up a project
cd your-project
ironbee install
This auto-detects your AI client and writes:
- Hook configuration (so the client calls IronBee automatically)
- Verification skill/rules (so the agent knows the workflow)
- MCP server config (so the agent has browser access)
- Browser-devtools permissions
Cursor: additional setup
Cursor requires manual activation of the MCP server after install:
- Restart Cursor to load the new hooks and MCP config
- Go to Settings → Tools & MCP and verify browser-devtools is enabled
- If the server shows as enabled but tools are unavailable, toggle it off and on
Note: This is a known Cursor limitation — MCP servers added via
mcp.jsonmay need manual activation.
That's it
The next time your AI agent edits code, IronBee will require browser verification before the task can complete.
Commands
ironbee install [project-dir] [--client <name>] Set up hooks and config
ironbee uninstall [project-dir] [--client <name>] Remove hooks and config
ironbee status [project-dir] Show verdict status for active sessions
ironbee verify [session-id] Dry-run verdict validation
ironbee analyze [session-id] Analyze session metrics (or all sessions)
Agent Commands (slash commands)
IronBee installs slash commands that the agent can use inside Claude Code or Cursor:
| Command | Description |
|---|---|
/ironbee-verify |
Verify changes — focused on affected areas (default) |
/ironbee-verify full |
Full verification — complete visual + functional + accessibility checklists |
/ironbee-verify visual |
Visual-only — contrast, layout, spacing, fonts, images, theming |
/ironbee-verify functional |
Functional-only — clicks, forms, navigation, data flow, error handling |
/ironbee-analyze |
Run session analytics and provide LLM-powered semantic insights |
/ironbee-verify guides the agent through a systematic verification process. The default mode focuses on what changed, while full runs every checklist item. Use visual or functional to narrow the scope when you know what type of testing is needed.
Configuration
IronBee loads config from two locations (project overrides global):
- Global:
~/.ironbee/config.json - Project:
<project>/.ironbee/config.json
{
"verifyPatterns": ["*.ts", "*.tsx", "*.css"],
"additionalVerifyPatterns": ["*.mdx"],
"ignoredVerifyPatterns": ["*.test.ts", "*.spec.ts"],
"maxRetries": 5
}
| Key | Description | Default |
|---|---|---|
verifyPatterns |
Glob patterns for files that require verification (replaces defaults) | 40+ code extensions |
additionalVerifyPatterns |
Extra patterns added on top of defaults | [] |
ignoredVerifyPatterns |
Patterns to exclude from verification (checked first) | [] |
maxRetries |
Max retry attempts before allowing completion | 3 |
Default verify patterns
By default, IronBee requires verification for common code file extensions: .ts, .tsx, .js, .jsx, .css, .scss, .html, .py, .go, .rs, .java, .vue, .svelte, and many more.
Non-code files like README.md, package.json, or .gitignore do not trigger verification.
Browser DevTools MCP config
By default, IronBee configures browser-devtools-mcp via npx. To customize the MCP server (e.g., use a local server or HTTP transport), add a browserDevTools key to your config:
{
"browserDevTools": {
"mcp": {
"url": "http://localhost:4000/mcp"
}
}
}
Or with a custom stdio command:
{
"browserDevTools": {
"mcp": {
"command": "node",
"args": ["./my-server.js"],
"env": { "MY_VAR": "value" }
}
}
}
You can also pass extra env vars to the default npx server without replacing it:
{
"browserDevTools": {
"env": { "BROWSER_HEADLESS_ENABLE": "true", "OTEL_ENABLE": "true" }
}
}
| Key | Description |
|---|---|
browserDevTools.mcp |
Full MCP server config — used as-is when provided. Supports command+args (stdio) or url (HTTP) |
browserDevTools.env |
Extra env vars merged into the default config. Only used when mcp is not provided |
Note: IronBee always sets
TOOL_NAME_PREFIX=bdt_andTOOL_INPUT_METADATA_ENABLE=true— these cannot be overridden.
Verification Flow
When the agent tries to complete a task, IronBee runs these checks:
- Were code files edited? — If no matching files were changed, the agent completes normally.
- Were browser tools used? — The agent must have called: navigate, screenshot, accessibility snapshot, and console check.
- Does a verdict exist? — The agent must submit a verdict via
ironbee hook submit-verdictafter testing. - Is the verdict valid? — Must include
session_id,status,pages_tested,checks,console_errors, andnetwork_failures. - Pass or fail? — Pass allows completion. Fail blocks the agent and asks it to fix the issues and re-verify.
- Retry limit — After
maxRetriesfailed attempts (default 3), the agent is allowed to complete but must report unresolved issues.
Verdict format
Verdicts are submitted via echo '<json>' | ironbee hook submit-verdict:
{
"session_id": "<your-session-id>",
"status": "pass",
"pages_tested": ["http://localhost:3000/dashboard"],
"checks": ["form submits successfully", "new item appears in list"],
"console_errors": 0,
"network_failures": 0
}
On failure, include an issues array describing what went wrong:
{
"session_id": "<your-session-id>",
"status": "fail",
"pages_tested": ["http://localhost:3000/dashboard"],
"checks": ["form renders", "submit button unresponsive"],
"console_errors": 2,
"network_failures": 0,
"issues": ["button click handler not firing", "TypeError in console"]
}
On pass after a previous fail, include a fixes array describing what was fixed:
{
"session_id": "<your-session-id>",
"status": "pass",
"pages_tested": ["http://localhost:3000/dashboard"],
"checks": ["form submits successfully", "new item appears in list"],
"console_errors": 0,
"network_failures": 0,
"fixes": ["reattached click handler to submit button", "fixed TypeError in event handler"]
}
The agent must submit a verdict after every verification attempt — both pass and fail. File edits are blocked until a verdict is submitted after using browser tools.
Session Isolation
Each AI session gets its own directory under .ironbee/sessions/<session-id>/:
.ironbee/sessions/<session-id>/
actions.jsonl # Event log (file edits, tool calls, verification markers)
verdict.json # Current verdict (cleared on code edit)
state.json # Session state (retries, active verification, trace ID, active fix, phase)
session.log # Debug log
This means parallel sessions (e.g., multiple Claude Code instances) don't interfere with each other.
Analytics
ironbee analyze provides metrics about verification sessions — how time is spent, how effective verifications are, and how confident we can be in the agent's code.
Usage
ironbee analyze <session-id> # single session analysis
ironbee analyze # all sessions (project-level)
ironbee analyze --json # JSON output
ironbee analyze --detailed # include verdict details (checks, issues, fixes)
ironbee analyze --json --detailed # JSON with verdict text for LLM semantic analysis
ironbee analyze <session-id> --json --detailed # single session JSON with verdict details
The --detailed flag includes raw verdict text (checks, issues, fixes) in the output. This is designed for LLM-powered semantic analysis — use /ironbee-analyze in Claude Code or Cursor to have the agent interpret these details automatically.
Session Analysis
Phase Distribution
Each session is divided into three phases:
| Phase | What it measures |
|---|---|
| Coding | Time from session start to first verification, and between fix end and next verification start |
| Verification | Time between verification_start and verification_end — browser testing |
| Fixing | Time between fix_start and fix_end — fixing failed verifications |
Cycles
| Metric | Meaning |
|---|---|
| Verifications | Number of verification cycles in the session |
| Fixes | Number of fix cycles (each fail verdict starts a fix) |
| Avg verify | Average duration of a verification cycle |
| Avg fix | Average duration of a fix cycle |
| First verify | Time from session start to first verification |
Verification Quality
| Metric | Meaning |
|---|---|
| First-pass rate | Percentage of verification chains where the first verdict was pass |
| Verdicts | Total verdict count (pass + fail) |
| Avg retries | Average number of fail verdicts before pass per chain |
| Avg console errs | Average console_errors across all verdicts |
| Avg network fails | Average network_failures across all verdicts |
| Avg pages tested | Average number of pages tested per verdict |
| Avg checks | Average number of checks performed per verdict |
Code Changes
| Metric | Meaning |
|---|---|
| Total edits | Total file edit operations in the session |
| Unique files | Number of distinct files edited |
| Avg per verify | Average file edits before each verification |
| Avg per fix | Average file edits during each fix cycle |
| Hot Files | Top 5 most frequently edited files |
| Problematic Files | Top 5 files with most edits during fix cycles |
| Edit Churn | Files edited in 2+ separate fix cycles (root cause may not be resolved) |
Fix Effectiveness
| Metric | Meaning |
|---|---|
| Success rate | Percentage of fixes followed by a pass verdict |
| Re-fail rate | Percentage of fixes followed by another fail verdict |
| Fix/verify | Ratio of fix cycles to verification cycles (0 = no fixes needed) |
Scoring
Three scores summarize the session:
| Score | Formula | What it measures |
|---|---|---|
| Efficiency | coding_time / (coding_time + fix_time) × 100 |
How much productive time vs fix overhead. High = minimal wasted time on fixes |
| Quality | (pass_pct + pages_pct + checks_pct + clean_pct) / 4 |
How thorough and clean the verification was. Components: pass rate, page coverage (3+ = 100%), check depth (5+ = 100%), error cleanliness (0 errors = 100%) |
| Confidence | pass_count / total_verdicts × 100 |
How likely the agent's code works. Based on verdict pass rate |
Project Analysis
When run without a session ID, ironbee analyze aggregates metrics across all sessions:
| Metric | Meaning |
|---|---|
| Session History | Each session's summary — duration, cycles, outcome, score |
| Avg duration | Average session duration across all sessions |
| Avg verifies | Average verification cycles per session |
| Avg fixes | Average fix cycles per session |
| First-pass rate | Percentage of sessions where the first verdict was pass |
| Fix success rate | Percentage of all fixes (across sessions) that succeeded |
| Abandon rate | Percentage of sessions with interrupted verification/fix cycles |
| Avg efficiency | Average efficiency score across all sessions |
| Avg confidence | Average confidence score across all sessions |
| Problematic Files | Top 5 files with most fix edits across all sessions |
Telemetry
IronBee collects anonymous usage data to help improve the product. No source code, file contents, or personally identifiable information is ever sent.
Events collected: install/uninstall, session start, verdict submissions (pass/fail status only), and verification gate decisions.
To opt out, set the environment variable:
export IRONBEE_TELEMETRY=false
Or set telemetryEnabled: false in ~/.ironbee/telemetry.json.
Development
npm install
npm run build # Compile TypeScript
npm run lint # ESLint
npm run test # Jest (unit + integration + client tests)
npm run dev # Run via ts-node
License
MIT
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found