ultracodex
Health Pass
- License — License: Apache-2.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 10 GitHub stars
Code Pass
- Code scan — Scanned 2 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Blend Codex headless into Claude Code's Workflow (ultracode) orchestration — cross-model adversarial verification, judge panels with a Codex juror, and second opinions.
ultracodex
Claude orchestrates. Codex cross-checks.
Blend Codex headless (codex exec) nodes into Claude Code's own Workflow
(ultracode) orchestration — so a genuinely different model family verifies,
judges, and second-guesses at the points where same-model agreement is least
trustworthy.
ultracodex is an open-source Claude Code plugin — a single skill,codex-workflow — that teaches Claude how to mix
Codex headless nodes into a Workflow:
Claude Code's agent() / pipeline() / parallel() orchestration tool, a.k.a.
ultracode. The orchestration logic stays in Claude's JavaScript; only who does
the work changes at chosen nodes. This is "Pattern A" — and it is additive,
not a model-switcher. Both families run in one Workflow.
Workflow (Claude JS orchestration)
├─ find / generate / synthesize .. Claude agent() ← broad, fast, cache-warm
└─ verify / judge / 2nd-opinion ... codex node ← GPT, independent failure modes
Requires ultracode mode. Because the skill authors and runs Workflows, the
Workflow orchestration tool must be available — which in Claude Code means
ultracode mode (enable it with/effort→ ultracode; it turns on dynamic
workflow orchestration). Without it the plugin still loads and the skill is found,
but there is no Workflow tool to drive. See Prerequisites.
Why it exists — the philosophy
More verifiers from the same model tend to share that model's blind spots: they
re-confirm the same correlated false positives. A different model family fails
on different inputs, so it can refute what same-model reviewers would rubber-stamp.
That is the whole idea:
- Diversity, not replacement. The win comes from disagreement between
independent models, so the goal is to keep both families in one run — never to
swap Claude out. "Add a second opinion" is right; "use GPT instead" misses the
point. - Independent failure modes beat more of the same. Routing a verify/judge node
to Codex reduces correlated error — the kind N more Claude verifiers cannot
catch because they fail in the same places. - The orchestration stays in Claude. Claude holds the context, schemas,
fan-out, and synthesis. A codex node is just a normalagent()whose subagent
relays acodex execrun. No Workflow primitive changes — a low-surface-area way
to get cross-model value. - A second model is not a source of truth. It is another failure-mode
distribution, best combined with tests, evidence, strict schemas, and a final
Claude synthesis. "Independent" is also relative: a shared prompt, shared
evidence, or a loose schema can still correlate errors — so use Codex
deliberately, where there is a verifiable artifact to check.
What you gain
- Catch false positives that survive same-model review — a different model
family can refute some correlated Claude errors before you act on a finding or
claim (a reduction in correlated error, not a guarantee). - Higher-trust verdicts on risky conclusions — cross-check a headline result
with a model that didn't produce it, before you report or ship. - Genuinely diverse judge panels — score candidates with a jury where no model
only judges its own work, cutting self-preference bias in LLM-as-judge setups. - Real solution diversity in generation — a Codex attempt in an N-way panel is
a different family's answer, not a reworded Claude one. - Large-context, multi-file verification with no new machinery — point Codex at
a tree (cwd/-C; read-only blocks writes, not reads) so it reads big diffs
and files itself instead of you pasting them into a prompt. - Copy-paste ready — four complete Workflow templates (cross-model review,
mixed judge panel, single-conclusion cross-check, loop-until-dry) plus the
canonicalcodexNodehelper. You start from working scripts, not a blank file. - Minimal surface change — keep every Workflow primitive exactly as-is; only
the chosen nodes change hands.
The highest-ROI use: adversarial verification
Claude finds broadly; Codex tries to refute each finding (defaulting to
"refuted" when uncertain). Survivors are the findings a different model family
could not knock down — a pipeline, no barrier:
const results = await pipeline(
DIMENSIONS,
d => agent(d.findPrompt, { phase: 'Find', schema: FINDINGS }), // Claude finds
review => parallel((review?.findings ?? []).map(f => () =>
codexNode(`Adversarially verify, defaulting to refuted=true if uncertain:\n${f.title}\n${f.detail}`,
{ schema: VERDICT, phase: 'Verify', label: `codex:${f.id}` }) // Codex refutes
.then(v => ({ ...f, verdict: v })))),
)
const confirmed = results.flat().filter(Boolean)
.filter(f => f.verdict && !f.verdict._codex_error && f.verdict.refuted === false)
Mixed judge panels and single-conclusion second opinions are the next
tiers — full scripts inworkflow-templates.md.
When to use it — and when not to
Cross-model is signal, not free signal. Reach for a codex node when a second,
independent model materially de-risks the result — verifying findings, judging
candidates, an independent attempt in a panel, sanity-checking a risky conclusion.
Skip the blend when:
- it's bulk throughput work (Claude subagents are cheaper, faster, cache-warm);
- it's a correlated check — Codex would only re-derive from the same evidence,
an echo rather than a second opinion; - it's a blind check — the node can't give Codex what it needs to verify
independently; - there's no verifiable/judgeable artifact (open-ended ideation) — diversity adds
noise, not signal.
Prerequisites
- Ultracode mode — the Workflow tool. The skill authors and runs Workflows,
so the Workflow tool (agent()/pipeline()/parallel()) must be available.
In Claude Code that means ultracode mode: enable it with/effort→ ultracode
(xhigh + dynamic workflow orchestration). The skill only provides the know-how for
authoring Workflows; it does not add the tool itself, so without ultracode
mode it has nothing to drive. - Codex CLI, installed and authenticated. Verify, then log in (interactive —
it cannot be done headlessly):command -v codex && codex --version codex login - A POSIX shell (
bash/zsh) for the relay nodes.
The skill hard-codes no Codex version or default model — environments differ.
It ships a cheap preflight (CLI present, auth live, structured-output path)
to run once before trusting any node.
Installation
ultracodex is its own single-plugin marketplace. Add it, install the plugin,
reload:
/plugin marketplace add KingGyuSuh/ultracodex
/plugin install ultracodex@ultracodex
/reload-plugins
Non-interactive (CLI) equivalent:
claude plugin marketplace add KingGyuSuh/ultracodex
claude plugin install ultracodex@ultracodex
# then restart Claude Code, or run /reload-plugins in an existing session
The
@ultracodexsuffix on install is the marketplace name (top-levelname
in.claude-plugin/marketplace.json), independent of the repo name — they happen
to match here.
Try it without installing
claude --plugin-dir ./plugins/ultracodex
Usage
The skill triggers automatically when you ask Claude to run a task as a custom
Workflow with a second model in the loop. Phrasings that activate it include:
- "Run this as a custom workflow that mixes in codex."
- "Have codex adversarially verify the findings."
- "Use codex as a juror / second opinion in the workflow."
- "Cross-model verify this while you orchestrate."
Claude then authors a Workflow whose verify/judge nodes shell out to codex exec
while find/generate/synthesize stay on Claude — starting from a template.
How it works — the codex node
A Workflow script's JS body has no filesystem access, so a codex node embeds
its JSON Schema as a string and lets the Bash-capable wrapper subagent write it to
a temp file. One schema object is the single source of truth: it feeds Codex's--output-schema and, when revalidate is true, the agent() re-validation. With
the default revalidate: true the node returns a parsed object; withrevalidate: false it relays raw JSON text for the caller to JSON.parse. The
contract:
codexNode(taskText, { schema, sandbox='read-only', model, cwd, effort, revalidate=true, phase, label })
→ Promise<parsedObject> // revalidate:true (default)
→ Promise<string> // revalidate:false — you JSON.parse it
→ { _codex_error: true } // on failure
Three rules are load-bearing:
- Relay, not solver. The wrapping Claude subagent is smart and will be tempted
to just answer the question — silently collapsing the cross-model node back into
a Claude node. The prompt forbids this; keep it at full strength. - Extract from the
-ofile, not stdout. stdout carries session chrome;-ois the one clean-JSON path. - Pass the prompt via stdin (
- < "$TASK") so quotes,$, and backticks
can't break or expand.
The full copy-paste helper lives canonically inworkflow-templates.md;
the mechanics, flags, sandbox tiers, troubleshooting, and escalation patterns are
in codex-headless.md.
Routing: Codex node vs Claude node
| Node's job | Run it on | Why |
|---|---|---|
| Find / generate / explore breadth | Claude | fast, cache-warm, cheap fan-out |
| Adversarially verify a finding | Codex | different failure modes can reduce correlated false positives |
| Judge / score candidates | Codex (or mixed panel) | a juror that didn't write the candidate |
| One attempt in a diverse panel | mix | genuine solution diversity, not reworded Claude |
| Synthesize / decide / write-up | Claude | holds the orchestration context |
Cost, concurrency & escalation
- Two billing surfaces. Every codex node pays the Anthropic wrapper turn plus
the Codex/OpenAI run. The wrapper only relays, but it still has a subagent's fixed
overhead — don't fan out hundreds casually; batch small items instead. - Keep codex nodes short. Each holds a Workflow concurrency slot
(min(16, cores−2)) for Codex's entire runtime while the wrapper idles on a
blocking Bash call. Read-only verify/judge/small-gen only. - Escalate long work. Minutes-to-hours, parallel, or resumable Codex jobs are
not Pattern A — use a background worker pool or a single-shot handoff
(described in the skill), never a long node inside a Workflow.
What's in the box
A single skill, codex-workflow:
SKILL.md— the mental model, thecodexNodecontract, the load-bearing
rules, routing, cost/concurrency, and the preflight.references/codex-headless.md—codex execflags, sandbox tiers, output
extraction, gotchas, troubleshooting, the_codex_errordiscipline, and
escalation patterns.references/workflow-templates.md— the canonicalcodexNodehelper and four
complete Workflow scripts, plus batch-node and large-payload variants.
ultracodex/ # repo root = marketplace root
├── .claude-plugin/
│ └── marketplace.json # single-plugin marketplace catalog
├── plugins/
│ └── ultracodex/ # the plugin
│ ├── .claude-plugin/
│ │ └── plugin.json # plugin manifest
│ ├── README.md
│ └── skills/
│ └── codex-workflow/
│ ├── SKILL.md
│ └── references/
│ ├── codex-headless.md
│ └── workflow-templates.md
├── docs/ # README translations (ko / ja / zh-CN)
├── LICENSE # Apache-2.0
├── NOTICE
└── README.md
Scope is deliberately skill-only: no commands, agents, hooks, or MCP servers.
Development
claude plugin validate ./plugins/ultracodex # plugin manifest + skill frontmatter
claude plugin validate . # marketplace manifest
claude plugin tag ./plugins/ultracodex # cut a release tag (manifests must agree)
Acknowledgements
The codex-workflow skill is the open-source generalization of a private in-repo
skill of the same name. This release removes environment-specific facts (it
preflights instead) and private path references, and was itself developed and
cross-checked using the plugin's own blended Claude+Codex workflows — including this
README, drafted and proofread cross-model.
License
Apache-2.0 © 2026 KingGyuSuh
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found