Audit — Dual Parallel Agent Code Audit Skill for Claude Code

A Claude Code skill that runs two independent deep agent-audits in parallel on the same change — an adversarial Opus review and an independent Codex review (codex exec) — then merges and dedupes their findings, fixes them, and iterates up to 3 rounds (fix → re-audit the corrected state) until a round comes back clean.

Designed as a pre-commit safety net for vibe-coded features, multi-file bug fixes, and any change where "looks right" isn't good enough. Different models catch different things, so the union of two independent passes is the value — and running them on separate backends (Anthropic + OpenAI) means zero mutual contention.

Why parallel, not Opus-then-Codex

Different models catch different classes of bug — the union is the win (e.g. Opus reconciles an aggregation to the dollar; Codex catches a non-executable wrapper script). The two agents run on separate backends (Anthropic + OpenAI), so there's zero mutual contention and both are non-blocking. Duplicate findings are cheap — the merge dedups them, and a finding flagged by both is just higher confidence. The one thing parallel loses — Codex never seeing the fixes within a single round — is recovered by re-auditing the corrected state every round.

What's in this repo

SKILL.md — the skill definition: frontmatter, the parallel dual-agent loop, the mandatory read-only block, the shared audit checklist, and the finding format
codex-audit.sh — the bundled Codex wrapper (the accessory script the skill launches for the Codex pass). Hardened for anti-hang, parallel-safety, and read-only enforcement
README.md — this file

Prerequisites

Claude Code — this is a Claude Code skill, not a standalone system prompt.
Opus access — the skill targets model: opus (family alias — always resolves to the current Opus release) with effort: xhigh. Both are fixed by design; see Customization.
The Codex CLI, installed and logged in — the Codex pass runs codex exec via the bundled codex-audit.sh. Install it and authenticate once:
```
npm install -g @openai/codex
codex login
```
Installing via npm requires Node.js 16+ (the @openai/codex package's declared floor — the CLI itself is a native binary). To authenticate you need a ChatGPT Plus / Pro / Business / Enterprise plan, or an OpenAI API key; the Free tier's rate limits are too tight for routine audit runs. See Codex pricing for tier details.

Note: this skill calls codex exec directly through the bundled wrapper — it does not depend on the codex-plugin-cc Claude Code plugin or the /codex:review slash command.
Optional but recommended: a project-level CLAUDE.md documenting invariants — the audit checklist cross-references it.

Installation

Copy both files into your Claude Code skills directory — the skill and its accessory script must live side by side:

mkdir -p ~/.claude/skills/audit
cp SKILL.md        ~/.claude/skills/audit/SKILL.md
cp codex-audit.sh  ~/.claude/skills/audit/codex-audit.sh
chmod +x ~/.claude/skills/audit/codex-audit.sh

The skill launches the wrapper via bash "$HOME/.claude/skills/audit/codex-audit.sh", so it works even without the executable bit — but setting it is good practice.

Verify

Start a new Claude Code session in any project and run /audit. Claude will auto-scope to recent changes (uncommitted edits or unpushed commits) or report there's nothing to audit if the repo is clean. Either response confirms the install.

Usage

With arguments — scope the audit to specific files or paths:

/audit src/routes/admin.ts src/lib/auth.ts

Without arguments — audits the most recent feature or fix from the current session:

/audit

The skill identifies the change automatically — git diff and git status for uncommitted edits, falling back to git log origin/HEAD..HEAD for unpushed commits if the tree is clean. If multiple unrelated changes are in flight, it asks which one to audit.

Heads-up: live verification needs a running dev server. The audit checklist's "Live verification" step calls your endpoints on http://localhost:*. Start your dev server before running /audit if you want those checks to actually run — otherwise the live tests fail and only the static portion of the audit produces findings.

Autonomous triggering (recommended)

/audit only runs when invoked manually unless you tell Claude otherwise. To make Claude self-trigger the audit on qualifying changes — without you having to remember — paste the snippet below into your project's CLAUDE.md (or your team's Definition of Done / pre-commit checklist).

The MANDATORY framing and blocker language are intentional. Softer phrasing ("consider running /audit") tends to get skipped under task pressure; the strong wording is what makes the rule actually fire.

**Audit Skill (MANDATORY)** — after completing any of the following, **you MUST run
`/audit`** before committing:
   - A new feature that adds a public surface (new endpoint, command, exported function, or UI flow)
   - A fix that touches 3+ source files (excluding tests and config), or rewrites more than ~30 lines in a single function
   - Any change to authentication, authorization, or access control
   - Any change to database schema, migrations, or data persistence
   - Refactors that move or rename code across multiple modules
   - Public API contract changes (request/response shapes, status codes, error formats)
   - Pagination, filtering, sorting, or aggregation logic
   - Code that combines results from multiple upstream sources (different services, tables, or APIs)
   - Any change to security-sensitive flows (credentials, tokens, webhooks, payments)

**Audit is NOT required for:** docs-only changes, comment-only changes, formatting / linting passes, dependency bumps that don't include accompanying code changes, or revert commits.

The `/audit` skill runs a dual parallel agent audit (adversarial Opus + independent Codex), merges the findings, and iterates up to 3 rounds until a round is clean. Do NOT skip it — treat audit findings as blockers that must be fixed before the commit. If round 3 still has open findings, escalate to the user — do not loop past 3 rounds or commit with open CRITICAL/HIGH findings.

Adapt the trigger list to your project's risk surface — add domain-specific patterns (e.g. "any change to billing logic", "any change to RLS policies", "any change to webhook signature verification") or remove items that don't apply.

How it works

Each round is a parallel dual-agent audit on the current state of the change. The skill repeats rounds until one returns zero confirmed findings, or 3 rounds elapse (then it escalates to you).

Both agents fire in parallel, non-blocking, on the same state:

Opus agent — a general-purpose subagent on model: opus, launched in the background. It reads the change, runs verification commands (live endpoint calls, type checks, grep sweeps), and returns a structured findings report. It covers:
- Type safety (nullability, as casts, any, ! assertions)
- API contracts (request validation, response shape, status codes, auth)
- Data consistency (pagination, filter sync, label maps)
- State and side effects (DB constraints, idempotency, error paths)
- Migrations (fresh-DB order, idempotency, backfill, reversibility)
- Auth and access control (role coverage, public endpoints, identity fields)
- Dead code and cleanup (orphaned imports, CSS, types)
- Edge cases (empty state, null propagation, boundaries, concurrency)
- Live verification (endpoint calls across auth tiers, staleness-aware reconciliation)
- Open investigation (anything off-checklist that smells wrong)
Codex agent — codex exec run as a full agent via the bundled codex-audit.sh, also in the background, on the same shared CHANGE_CONTEXT and the same checklist. It can run git / psql / curl to verify findings live.

Neither agent edits code. A mandatory read-only block is prepended verbatim to both prompts. As an enforcement backstop, the wrapper snapshots the worktree: if Codex edits a tree that was clean before, it auto-reverts and prints CODEX_AUDIT_REVERTED; if the tree was already dirty, it can't safely revert and prints CODEX_AUDIT_WARN_EDITED with the paths.

Merge + dedup. When both agents return, their findings are normalized and merged. Two findings at the same location describing the same logical problem collapse into one tagged source: both (highest confidence); unique findings stay source: opus or source: codex. The result is presented as a single consolidated findings table every round:

#	Severity	Source	Location	Issue	Recommendation
F-001	CRITICAL	both	`file:line`	what's wrong	the fix

Iterate or escalate.

Clean round (0 findings) → the change is clean; commit per the project's Definition of Done. Stop.
Findings remain → fix every CRITICAL/HIGH and confirmed MEDIUM, then re-run on the corrected state (round r+1). Round 1 audits the full change; rounds 2–3 narrow to the files the change and its fixes touched — catching fix-induced regressions without rescanning unrelated code.
Hard cap: 3 rounds. If round 3 still has findings, the skill stops and escalates to you with a summary — one more targeted pass, commit as-is, or hand off specific findings. It never loops past 3 rounds and never commits with open CRITICAL/HIGH without your say-so.

Codex never blocks the audit. The wrapper guards against hangs (< /dev/null + timeout + 1 retry + a non-empty-output gate). If it still can't produce output, the skill proceeds with the Opus findings only and says so explicitly.

Parallel-safe across projects. Each run uses an isolated per-run CODEX_HOME (its own log DB → no contention on the shared ~/.codex DB) and namespaced temp files, so concurrent audits in different projects don't interfere. Audit quality is independent regardless — every run reads its own files and runs its own verification.

Security escalation. If the change touches an auth/authz boundary, cryptographic primitives, or you flag it security-critical, the skill adds a third adversarial security lens — a second Codex pass and/or an extra Opus agent, both with an explicitly security-framed prompt. This is reserved for those three narrow triggers, not used by default.

Customization

The skill ships configured for a Bun + TypeScript project. Adapt for your stack by editing the allowed-tools line in SKILL.md:

Stack	Replace Bun flags with
npm	`Bash(npm test ) Bash(npm run ) Bash(npx tsc *)`
pnpm	`Bash(pnpm test ) Bash(pnpm run ) Bash(pnpm tsc *)`
Yarn	`Bash(yarn test ) Bash(yarn run ) Bash(yarn tsc *)`
Python	`Bash(pytest ) Bash(python -m ) Bash(uv run *)`
Go	`Bash(go test ) Bash(go vet ) Bash(go build *)`
Rust	`Bash(cargo test ) Bash(cargo check ) Bash(cargo clippy *)`

The audit checklist itself is also stack-opinionated. It assumes TypeScript types, HTTP endpoints, and a frontend↔backend split — sections like Type Safety (as casts, ! assertions) and Live Verification (curl to localhost) won't map cleanly to Python data pipelines, Rust CLIs, Go services, or libraries without a web surface. For those stacks, edit the shared Audit checklist in SKILL.md — drop or rewrite the sections that don't fit, add stack-specific checks (e.g. "Pydantic model validation", "trait bounds and lifetimes", "FFI safety", "context cancellation"), or swap in your team's severity definitions. The checklist is shared verbatim by both agents, so one edit updates both passes.

Don't tune model: or effort:. This skill is intentionally pinned to the latest Opus on xhigh effort. Audit quality drops sharply on smaller models or lower effort, and the opus alias already auto-tracks new releases, so there's nothing to maintain.

The one frontmatter dial worth tweaking is description: — it controls when Claude auto-suggests the skill. Tighten or loosen the trigger phrasing if it fires too often or not often enough.

License

Free to use, modify, and share.

audit-skill