adversarial-review

agent
Security Audit
Warn
Health Warn
  • License — License: Apache-2.0
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Warn
  • Code scan incomplete — No supported source files were scanned during light audit
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This tool is a Claude Code skill that performs adversarial AI code and plan reviews. It routes your code or architecture plans to a secondary AI (OpenAI Codex) tasked with finding flaws, iterating until the reviewer approves.

Security Assessment
Overall risk: Medium. The tool functions as a bridge between two AI systems, meaning your code and context are sent to both Anthropic and OpenAI for processing. This inherently involves transmitting your codebase diffs over the network to external APIs. The setup relies on linking a local repository folder to your Claude environment. While no hardcoded secrets were found, the tool requires you to supply OpenAI API keys via environment variables. A light code scan could not be completed due to unsupported source files, meaning a deeper manual check for hidden vulnerabilities is not possible from the automated audit. No dangerous local permissions are requested.

Quality Assessment
The repository is brand new and has very low visibility, indicated by only 5 GitHub stars. However, it is fully active, with the most recent push happening today. It benefits from a standard Apache-2.0 license, making it safe for commercial and private use. The project is lightweight by design, acting as a set of Markdown instructions rather than a complex application, which minimizes the attack surface.

Verdict
Use with caution — while the project is active and safely licensed, the low community trust, lack of automated code scanning, and requirement to transmit local code to external AI APIs warrant a closer look before integrating into sensitive workflows.
SUMMARY

Claude Code skill for adversarial AI code & plan review. One AI writes, another tears it apart. Iterative fix loop until approved.

README.md

Adversarial Review

Claude Code skill for adversarial AI code and plan review.

One AI writes the code. Another tears it apart. Iterate until approved.

What is this

Most AI code review tools validate your changes — "looks good, maybe add tests."
Adversarial review does the opposite: the reviewer defaults to skepticism
and tries to break confidence in the change. It looks for what will fail
in production, not what might be nice to improve.

This is a Claude Code skill
— a SKILL.md file plus a small references/runner.md that together
teach Claude how to run adversarial reviews through an external AI model
(currently OpenAI Codex).

Key features

  • Plan review — review the plan BEFORE writing code. Catch architecture
    mistakes, missing steps, and risks early
  • Code review — review the implementation. Bugs, security, data loss
  • Code-vs-plan — verify the implementation matches the plan
  • Iterative — Claude fixes issues based on reviewer feedback and resubmits
    for re-review. Up to 5 rounds until approved
  • LightweightSKILL.md + one references/runner.md (thin runner
    subagent spec), no server, no broker, no external runtime deps. Compare
    with codex-plugin-cc:
    ~15 JS modules, App Server, JSON-RPC broker, lifecycle hooks

How it works

┌─────────┐     ┌──────────┐     ┌─────────┐
│  Claude  │────>│ Reviewer  │────>│  Claude  │
│  (code)  │     │ (Codex)   │     │  (fix)   │
└─────────┘     └──────────┘     └─────────┘
     ^                                 │
     │         ┌──────────┐            │
     └─────────│ Reviewer  │<───────────┘
               │(re-review)│
               └──────────┘
                    │
              VERDICT: APPROVED

Three modes

Mode What it reviews When to use
plan Implementation plan Before writing code
code Git diff (unstaged, staged, or branch) After writing code
code-vs-plan Code changes against the plan Verify implementation matches plan

Mode is auto-detected from context, or you can force it with an argument.

Quick start

1. Prerequisites

Claude Code and
OpenAI Codex CLI must be installed.

Verify both are available:

claude --version   # Claude Code CLI
codex --version    # OpenAI Codex CLI (>= 0.115.0)

If Codex is missing: npm install -g @openai/codex

Authentication. Codex needs an OpenAI account. Either:

  • Sign in interactively: codex (opens browser)
  • Or set CODEX_API_KEY env var for non-interactive use

2. Install the skill

git clone https://github.com/dementev-dev/adversarial-review.git
mkdir -p ~/.claude/skills
ln -sfn "$(pwd)/adversarial-review" ~/.claude/skills/adversarial-review

Verify both the skill entry-point AND the runner subagent spec are in place:

ls -la ~/.claude/skills/adversarial-review/SKILL.md
ls -la ~/.claude/skills/adversarial-review/references/runner.md

Migrating from a previous install at ~/.agents/skills/: delete
the old symlink (rm ~/.agents/skills/adversarial-review) and
install at the new path above. Claude Code ≥ 2.x uses ~/.claude/skills/.

3. Add permissions

The skill runs git, codex exec, and writes temp files to /tmp.
Without pre-approved permissions, Claude Code will prompt for each action.

Where to add. Since the skill is installed globally
(~/.claude/skills/), permissions should go into the global config
so they work in any project:

Install scope Config file
Global (recommended) ~/.claude/settings.json
Single project <project>/.claude/settings.local.json

Merge the following rules into the permissions.allow array of the
chosen config file:

// --- adversarial-review permissions ---
// Git: diff, status, branch detection, repo root, submodule check
"Bash(git diff*)",
"Bash(git status*)",
"Bash(git symbolic-ref*)",
"Bash(git rev-parse*)",
// Codex: initial launch (uses -C; prompt fed via cat | pipe for env portability)
"Bash(cat /tmp/codex-prompt-* | timeout 600 codex exec *)",
// Codex: resume (cd prefix because resume has no -C flag; prompt via cat | pipe)
"Bash(cd * && cat /tmp/codex-resume-prompt-* | timeout 600 codex exec resume *)",
// Session-id filesystem fallback (POSIX: find -newer + grep -l for content-match)
"Bash(find ~/.codex/sessions*)",
// Diagnostic aid when filesystem fallback finds nothing
"Bash(ls -t ~/.codex/sessions*)",
// Temp files: prompts (initial + resume), plans, review output, JSONL stdout, stderr
"Write(/tmp/codex-plan-*)",
"Write(/tmp/codex-prompt-*)",
"Write(/tmp/codex-resume-prompt-*)",
"Read(/tmp/codex-review-*)",
"Read(/tmp/codex-stdout-*)",
"Read(/tmp/codex-stderr-*)",
// Archive failed-resume diagnostics before fresh exec overwrites them
"Bash(mv /tmp/codex-stdout-* /tmp/codex-stdout-*-failed-resume.jsonl)",
"Bash(mv /tmp/codex-stderr-* /tmp/codex-stderr-*-failed-resume.txt)",
// Cleanup
"Bash(rm -f /tmp/codex-*)",
// Main thread: write prompt body for the runner subagent (NEW in refactor)
"Write(/tmp/codex-body-*)",
// Main thread: read the structured JSON result returned by the runner (NEW)
"Read(/tmp/codex-runner-result-*)",
// Runner subagent (inherited): read the prompt body main wrote (NEW)
"Read(/tmp/codex-body-*)",
// Runner subagent (inherited): write the result JSON main reads (NEW)
"Write(/tmp/codex-runner-result-*)",
// Runner-spec discovery: tier 1 (user-scoped install)
"Bash(ls ~/.claude/skills/adversarial-review/references/runner.md*)",
// Runner-spec discovery: tier 2 (plugin-marketplace cache)
"Bash(ls ~/.claude/plugins/cache/*/*/*/skills/adversarial-review/references/runner.md*)"
Full example (if the config file is empty or does not exist)
{
  "permissions": {
    "allow": [
      // adversarial-review
      "Bash(git diff*)",
      "Bash(git status*)",
      "Bash(git symbolic-ref*)",
      "Bash(git rev-parse*)",
      "Bash(cat /tmp/codex-prompt-* | timeout 600 codex exec *)",
      "Bash(cd * && cat /tmp/codex-resume-prompt-* | timeout 600 codex exec resume *)",
      "Bash(find ~/.codex/sessions*)",
      "Bash(ls -t ~/.codex/sessions*)",
      "Write(/tmp/codex-plan-*)",
      "Write(/tmp/codex-prompt-*)",
      "Write(/tmp/codex-resume-prompt-*)",
      "Read(/tmp/codex-review-*)",
      "Read(/tmp/codex-stdout-*)",
      "Read(/tmp/codex-stderr-*)",
      "Bash(mv /tmp/codex-stdout-* /tmp/codex-stdout-*-failed-resume.jsonl)",
      "Bash(mv /tmp/codex-stderr-* /tmp/codex-stderr-*-failed-resume.txt)",
      "Bash(rm -f /tmp/codex-*)",
      "Write(/tmp/codex-body-*)",
      "Read(/tmp/codex-runner-result-*)",
      "Read(/tmp/codex-body-*)",
      "Write(/tmp/codex-runner-result-*)",
      "Bash(ls ~/.claude/skills/adversarial-review/references/runner.md*)",
      "Bash(ls ~/.claude/plugins/cache/*/*/*/skills/adversarial-review/references/runner.md*)"
    ]
  }
}

Security note: The codex exec rule allows any codex exec invocation
wrapped in timeout 600. The skill only uses read-only mode (-s read-only),
but Claude Code's permission patterns are prefix-based and cannot enforce flag
constraints. If you prefer tighter control, omit the codex exec rule and
approve each invocation manually.

4. Use

/adversarial-review                       # auto-detect mode
/adversarial-review plan                  # force plan review
/adversarial-review code                  # force code review
/adversarial-review path/to/f             # review a specific file
/adversarial-review xhigh                 # higher reasoning effort
/adversarial-review model:gpt-5.3-codex   # use a different model

Prompt architecture

The skill uses XML-structured prompts with adversarial stance:

  • <role> — adversarial reviewer, defaults to skepticism
  • <operating_stance> — break confidence, not validate
  • <attack_surface> — concrete checklist: auth, data integrity,
    race conditions, rollback safety, schema drift, error handling, observability
  • <finding_bar> — every finding must answer 4 questions:
    what can go wrong, why vulnerable, impact, recommendation
  • <scope_exclusions> — no style, naming, or speculative comments
  • <calibration> — one strong finding > five weak ones

Example output

See examples/review-output.md for a sample review.

Troubleshooting

codex exec exits with model error.
Some models are unavailable with ChatGPT accounts (e.g. o3-mini).
The default gpt-5.4 works with both ChatGPT and API key auth.
Override with /adversarial-review model:<name>.

Permission prompts on every action.
Add the permissions from the setup section. Check that
the file is valid JSON and in the right location (project .claude/settings.local.json
or global ~/.claude/settings.json).

Codex hangs / timeout (exit code 124).
All codex exec calls are wrapped in timeout 600 (10 minutes). If you see
exit code 124, the reviewer did not respond in time. Retry — this is usually
transient.

Resume fails with session error.
The skill uses codex exec resume <session-id> for rounds 2+. On failure
(non-zero exit, thread/resume failed in stderr, or a malformed review),
the skill does NOT silently fall back. In an interactive session it asks
whether to run a fresh codex exec (higher token cost) or conclude the
review as NOT VERIFIED. In headless runs it decides based on the maximum
severity of the last successful round's findings: critical/high → fresh
exec; medium-only → conclude as NOT VERIFIED.

--json stdout is empty in my Claude Code session (session ID capture noise).
In some Claude Code sandbox configurations codex's --json event stream is
suppressed when stdout is redirected to a file — the /tmp/codex-stdout-*.jsonl
ends up 0 bytes even though the review itself (-o /tmp/codex-review-*.md)
completes correctly. The skill handles this automatically via a filesystem
fallback: every prompt includes a per-launch session marker
(<!-- ADVERSARIAL-REVIEW-SESSION: <REVIEW_ID>-<ATTEMPT_ID> -->) where
ATTEMPT_ID is a fresh random integer regenerated for the initial exec,
every retry, every resume, and every fresh-exec fallback. The marker is
written to the rollout JSONL on disk. When the JSONL stream is empty, the
skill runs find ~/.codex/sessions -name 'rollout-*.jsonl' -newer <prompt-file> -exec grep -l <REVIEW_ID>-<ATTEMPT_ID> {} + to positively
identify this specific launch's rollout (not by newest-mtime, which would
be unsafe against parallel codex; not by review-stable ID alone, which
would match stale retry rollouts) and extracts the UUID from the filename.
Zero or multiple matches → the skill fails closed with a diagnostic rather
than silently picking. Resume continues to work normally. All commands are
POSIX (find -newer, -exec grep -l) and work identically on Linux and
macOS.

"NOT VERIFIED" result.
The skill applied fixes but the reviewer did not re-verify them (resume
failed or the operator chose to conclude). This is not an approval —
manually review the applied fixes before merging.

Running inside a git submodule.
git rev-parse --show-toplevel returns the submodule path, not the parent
repo. The skill warns you and scopes the review to the submodule. If you
meant to review the parent, invoke the skill from the parent working tree.

Bare repository or not inside a work tree.
The skill aborts at Step 2 with a clear message. Run it from inside a
git working tree.

Plan Mode exits when writing temp files.
In Claude Code Plan Mode, writing to /tmp may trigger a permission prompt
or exit Plan Mode. This is a known Claude Code limitation. It does not affect
review correctness.

Known limitations

  • Plan Mode and /tmp writes. Writing review prompts to /tmp may trigger
    a permission prompt or cause Plan Mode to exit. Does not affect review correctness.
  • resume inherits sandbox. codex exec resume does not accept -s
    sandbox is inherited from the original session (always read-only).
  • resume has no -C flag. The skill captures REPO_ROOT via
    git rev-parse --show-toplevel at Step 2 and prefixes every resume with
    cd '<REPO_ROOT>' && .... This requires paths without single quotes;
    pathological paths (containing ', ", $, backtick, newline) cause
    the skill to abort at Step 2.
  • Submodule scoping. When invoked inside a submodule, the review is
    scoped to the submodule — git rev-parse --show-toplevel does not walk
    up to the parent. A warning is printed; invoke from the parent repo if
    you want parent scope.
  • macOS end-to-end not tested. The secondary session-id capture
    uses only POSIX flags (find -newer FILE, -exec CMD {} +, grep -l),
    so it should work identically on macOS as on Linux, but the skill has
    not been end-to-end tested on macOS.

Roadmap

  • Gemini as alternative reviewer backend
  • Local model support (Ollama, llama.cpp)
  • CI integration (GitHub Actions)
  • Multi-reviewer mode (parallel review by multiple models)

Inspiration

Adversarial prompt structure developed after studying
openai/codex-plugin-cc (Apache-2.0).

Borrowed ideas: XML-structured prompts, adversarial stance, attack surface
checklist, finding bar, calibration rules.

What we did differently:

  • Iterative loop — Claude fixes issues and resubmits (not "stop and ask user")
  • Plan review — reviews plans before code, not just code
  • Minimal installSKILL.md + one references/runner.md, no
    server, no broker, vs 15+ JS modules
  • Verbatim output — reviewer findings shown as-is, not rephrased

License

Apache-2.0 — see LICENSE.

Reviews (0)

No results found