ProductionOS
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Fail
- eval() — Dynamic code execution via eval() in .github/workflows/ci.yml
- child_process — Shell command execution capability in bin/install.cjs
- fs.rmSync — Destructive file system operation in bin/install.cjs
- os.homedir — User home directory access in bin/install.cjs
- process.env — Environment variable access in bin/install.cjs
- fs module — File system access in bin/install.cjs
- eval() — Dynamic code execution via eval() in hooks/eval-gate.sh
Permissions Pass
- Permissions — No dangerous permissions requested
This tool is an AI agent plugin for Claude Code and Codex CLI that deploys multiple automated agents to review, score, and iteratively improve a user's codebase.
Security Assessment
The overall risk is High. The installation script (`bin/install.cjs`) exhibits highly invasive behaviors: it accesses the user's home directory, reads environment variables, performs destructive file system operations (`fs.rmSync`), and can execute arbitrary shell commands (`child_process`). Furthermore, the tool relies on dynamic code execution via `eval()` in both its installation workflows (`ci.yml`) and shell hooks (`eval-gate.sh`). These combinations of capabilities present a significant attack surface, as a compromised update or malicious payload could easily lead to local privilege escalation, data theft, or system destruction. No hardcoded secrets were detected.
Quality Assessment
The project is very new and suffers from extremely low community visibility, having only accrued 5 GitHub stars. While it is actively maintained (last push was today) and responsibly uses a standard MIT license, the codebase's quality is questionable due to its reliance on unsafe coding practices like `eval()`. Community trust cannot be reliably established at this early stage.
Verdict
Not recommended.
ProductionOS v1.0 — Claude Code plugin with 76 agents, 39 commands, and 12 hooks. Deploys specialized agents that review, score, and improve your entire codebase. Smart routing, recursive convergence, self-evaluation.
ProductionOS
One command. Your entire codebase reviewed, scored, and improved.
ProductionOS is a dual-target AI engineering OS for Claude Code and Codex with 78 agents, 41 commands, and 17 hooks. It deploys specialized agents that review your code, find issues, fix them, and keep improving until every quality dimension hits the target. Smart routing dispatches the right workflow for your goal automatically.
Quick Start
# Step 1: Add the marketplace (one time)
claude plugin marketplace add ShaheerKhawaja/ProductionOS
# Step 2: Install the plugin
claude plugin install productionos
# Step 3: Restart Claude Code, then run on any codebase
/production-upgrade
Alternative: Manual install (if marketplace commands fail)
# Clone directly into the plugins directory
git clone https://github.com/ShaheerKhawaja/ProductionOS.git \
~/.claude/plugins/marketplaces/productionos
# Restart Claude Code — hooks, commands, and agents load automatically
Verify installation:
# Check the plugin is recognized
claude plugin list
# Validate the plugin schema
claude plugin validate ~/.claude/plugins/marketplaces/productionos/.claude-plugin/marketplace.json
That's it. ProductionOS discovers your stack, deploys 7 review agents in parallel, scores your code across 10 dimensions, and generates a fix plan. Run it again — the score goes up.
Codex CLI Skill Install
npx productionos@latest --codex
This installs:
~/.codex/skills/productionos~/.codex/plugins/productionos~/.codex/skills/productionos-<workflow>aliases for every ProductionOS workflow
Restart Codex to pick up the new skill and plugin.
Install For Claude + Codex Together
npx productionos@latest --all-targets
Codex App / Plugin Surface
ProductionOS now ships a native Codex plugin manifest at .codex-plugin/plugin.json plus a plugin skill at skills/productionos/SKILL.md.
Recommended Codex usage:
$productionosfor the umbrella workflow router$productionos-review,$productionos-plan-eng-review,$productionos-production-upgrade, etc. for direct workflow entrypoints
What It Does
The Core Loop
DISCOVER → REVIEW → PLAN → FIX → VALIDATE → repeat until 10/10
/production-upgrade— Audit + fix any codebase. Deploys 7 parallel agents (CEO strategic, engineering, code quality, security, UX, backend, database). Shows BEFORE → AFTER scores./omni-plan-nth— The recursive orchestrator. Chains every available skill, loops until every quality dimension scores 10/10. No shortcuts, no "good enough."/auto-swarm-nth— Throw 7 parallel agents at any task. Each wave covers gaps the last missed. Supports worktree isolation for zero-conflict parallel execution./designer-upgrade— Full UI/UX redesign: audit → design system → interactive HTML mockups → browser annotation → implementation plan.
Safety & Guardrails
Every action runs through these checks:
- Self-Eval Protocol — 7 questions after every agent action (quality, necessity, correctness, dependencies, completeness, learning, honesty). Score gating: pass, self-heal, or block.
- Repo Boundary Guard — Prevents agents from editing files outside your project. Hard-blocks git operations in wrong repos.
- Secret Detection — gitleaks + regex fallback blocks commits containing API keys, tokens, passwords.
- Protected Files —
.env, credentials, keys, certs blocked from writes. - Approval Gate — High-stakes actions (deploy, auth changes, push) require explicit approval.
For Different Users
Solo Founders / Small Teams
You need the output of a 10-person team but have 1-2 people. ProductionOS fills the roles:
| Role | ProductionOS Equivalent |
|---|---|
| Code Reviewer | code-reviewer agent (2-pass: critical + informational) |
| QA Engineer | /qa command + ux-auditor agent |
| Security Auditor | security-hardener + semgrep-scanner + gitleaks hook |
| Solutions Architect | architecture-designer + /plan-eng-review |
| CTO / Strategic | /plan-ceo-review (rethink the problem, find the 10-star product) |
| Designer | /designer-upgrade (audit → design system → mockups) |
| Release Manager | /ship (merge → test → version → push → PR) |
Start with: /production-upgrade to see where you stand, then /omni-plan-nth to fix everything.
Open Source Maintainers
You need consistent code quality across contributors. ProductionOS provides:
- Quality Gates — Configurable thresholds in
quality-gates.yml(test ratio, complexity, security) - CI Integration — GitHub Actions workflow validates agents, runs convergence checks
/review— Pre-landing code review that catches SQL safety, LLM trust boundaries, enum completeness/retro— Weekly retrospective with commit analysis, test health, shipping velocity
Start with: /review on incoming PRs, /retro weekly.
Agencies / Consultants
You work across many codebases and need fast, repeatable quality audits:
/production-upgrade— Drop into any codebase, get a scored report in 10 minutes/deep-research— 8-phase autonomous research on any technology or pattern/auto-swarm-nth— Throw up to 77 parallel agents at any task (research, build, audit, fix)/max-research— Nuclear option: 500+ agents for exhaustive topic coverage
Start with: /production-upgrade for client audits, /deep-research for technology evaluation.
AI Engineers
You're building AI-powered products and need agent orchestration patterns:
- 78 agent definitions with YAML frontmatter (model routing, tool constraints, stakes classification)
- 10-layer prompt composition (Emotion → Meta → Context → CoT → ToT → GoT → CoD → Generated Knowledge → Distractor-Augmented)
- Tri-tiered judging (3 independent judges with debate on disagreement)
- Convergence engine (recursive improvement with regression detection)
- Worktree isolation (parallel agents in isolated git worktrees)
Start with: Read ARCHITECTURE.md for the agent orchestration patterns.
Commands (41)
Start Here
/build-productionos Smart router — routes any intent to the right pipeline
/production-upgrade Audit + fix any codebase (the entry point)
/productionos-help Usage guide
Recursive (loops until target met)
/omni-plan-nth Recursive until 10/10 across ALL dimensions
/auto-swarm-nth "task" Recursive swarm until 100% coverage
Pipeline (single-pass with convergence)
/omni-plan 13-step pipeline with tri-tiered judging
/auto-swarm "task" parallel agent waves
/frontend-upgrade CEO vision + parallel swarm transformation
/designer-upgrade UI/UX redesign — audit → design system → mockups
/ux-genie User stories + journey maps + friction analysis
Quality
/self-eval Self-evaluate recent work (default-on in all flows)
/security-audit 7-domain OWASP/MITRE/NIST sweep
/retro Engineering retrospective with trend tracking
Research
/deep-research [topic] 8-phase autonomous research
/max-research [topic] 500+ agents (nuclear option)
Development
/brainstorming Idea exploration before building
/writing-plans Step-by-step implementation plans
/tdd Test-driven development
/debug Systematic debugging with hypothesis tracking
/plan-ceo-review CEO/founder strategic review
/plan-eng-review Engineering architecture review
/review Pre-landing code review
/ship Merge → test → version → push → PR
/qa QA testing with health scoring
/browse Headless browser for inspection
Lifecycle
/productionos-pause Save pipeline state
/productionos-resume Resume from checkpoint
/productionos-update Self-update from GitHub
/auto-mode [idea] Idea-to-running-code pipeline
/logic-mode [idea] Business idea validation
/learn-mode [topic] Interactive code tutor
Architecture
78 agents (declarative YAML frontmatter, 3-tier model routing)
41 commands (orchestrate agents, loop until convergence)
17 hook files (SessionStart, PreToolUse security, PostToolUse telemetry, Stop handoff)
11 templates (PREAMBLE, SELF-EVAL, INVOCATION, PROMPT-COMPOSITION, MODEL-ROUTING, etc.)
7 CLI tools (pos-init, pos-config, pos-analytics, pos-review-log, etc.)
47 skills (Claude auto-activation + Codex-native workflow wrappers)
Agent Model
Every agent has:
- Model tier — opus (planning), sonnet (execution), haiku (validation)
- Tool constraints — exactly which tools each agent can use
- Stakes classification — LOW (auto), MEDIUM (warn), HIGH (block until approved)
- Red flags — behavioral guardrails specific to each agent's role
Execution Model
Command invoked
→ Preamble runs (context load, security scan, session tracking)
→ Agents dispatched (parallel where possible, sequential where needed)
→ Self-eval runs on each output (7-question protocol)
→ Score gating (pass / self-heal / block)
→ Convergence check (continue / pivot / deliver)
→ Postamble (instinct extraction, telemetry, handoff)
Hooks
SESSION START → init state, detect project, show banner
EVERY EDIT → repo boundary guard → protected file guard → security scan
AFTER EDIT → self-learn → telemetry → review hint (after 10+ edits)
GIT COMMIT → gitleaks secret scan
SESSION END → mini-retro → handoff doc → instinct extraction
Estimated Costs
| Command | Agents | Cost |
|---|---|---|
| /production-upgrade | 7-55 | $1-5 |
| /omni-plan-nth | 14-147 | $3-15 |
| /auto-swarm-nth | 7-77 | $1-8 |
| /designer-upgrade | 5-20 | $2-8 |
| /max-research | 500-1000 | $15-75 |
Use --profile budget for ~40% savings.
Installation
Recommended: via Claude Code marketplace
# 1. Add the marketplace
claude plugin marketplace add ShaheerKhawaja/ProductionOS
# 2. Install the plugin from the marketplace
claude plugin install productionos
# 3. Restart Claude Code
Alternative: manual git clone
git clone https://github.com/ShaheerKhawaja/ProductionOS.git \
~/.claude/plugins/marketplaces/productionos
Verify
# Confirm plugin is loaded
claude plugin list
# Validate schema (should show 0 errors)
claude plugin validate ~/.claude/plugins/marketplaces/productionos/.claude-plugin/marketplace.json
Restart Claude Code after install. Hooks, commands, and agents load on session start.
Update
claude plugin update productionos
Uninstall
claude plugin uninstall productionos
Validation
cd ~/.claude/plugins/marketplaces/productionos
bun install && bun test # 932 pass, 1 skip, 0 unexpected failures
bun run validate # 78/78 agents valid
bun run skill:check # Health dashboard (100%)
What Makes This Different
Recursive, not single-pass. Other tools review once and stop. ProductionOS loops: review → plan → fix → validate → re-score. It doesn't stop until the target is hit or proves it's unreachable.
Self-evaluating by default. Every action is questioned: Was this necessary? Is this correct? Did I miss dependencies? Scores below threshold trigger automatic self-heal loops.
Separated concerns. Judges cannot modify code. Fixers cannot evaluate their own work. Three independent judges must reach consensus — if they disagree, they debate.
Observable. Cost estimation before every run. Convergence dashboard showing real-time progress. Run history tracking across sessions. You see the machine working.
Guarded. Repo boundary detection, secret scanning, protected file blocking, stakes-based approval gates. Guardrails you can't forget because they're hooks, not habits.
Agent Families
These aren't chat personas. They're specialized workflows with defined inputs, outputs, tool restrictions, and quality criteria. All agents have YAML frontmatter with model, tools, subagent_type, stakes, and behavioral Red Flags.
ProductionOS currently ships 78 agent definitions spanning:
- Review and judging
- Planning and orchestration
- Refactoring and execution
- Security and compliance
- Design and UX
- Research and context retrieval
- Worktree and release operations
Representative agents:
- Review:
llm-judge,deep-researcher,code-reviewer,ux-auditor,dynamic-planner,business-logic-validator,dependency-scanner - Architecture:
api-contract-validator,naming-enforcer,refactoring-agent,database-auditor,test-architect,performance-profiler,migration-planner - Orchestration:
self-healer,convergence-monitor,decision-loop,metaclaw-learner,research-pipeline,security-hardener,gitops - Design:
frontend-designer,comparative-analyzer,reverse-engineer,design-system-architect,designer-upgrade,ux-genie,user-story-mapper - Coordination:
quality-loop-controller,session-context-manager,browser-controller,version-control,e2e-architect,document-parser,worktree-orchestrator - Quality:
quality-gate-enforcer,semgrep-scanner,regression-detector,documentation-auditor,rag-expert,db-creator,aiml-engineer
The exact current roster lives in agents/ and the generated handoff in docs/CODEX-PARITY-HANDOFF.md.
Agent Stakes Classification (HumanLayer Pattern)
| Stakes | Approval | Example Agents |
|---|---|---|
| HIGH | Block until explicit approval | llm-judge, adversarial-reviewer, deep-researcher, guardrails-controller, vulnerability-explorer, migration-planner, gitops, worktree-orchestrator |
| MEDIUM | Warn with context, proceed | code-reviewer, database-auditor, dynamic-planner, architecture-designer, ux-auditor, refactoring-agent |
| LOW | Auto-approve | naming-enforcer, density-summarizer, context-retriever, comms-assistant, dependency-scanner |
CLI Tools
pos-init # Initialize ~/.productionos/ state directory
pos-config list # Show all settings
pos-config get key # Get a setting (proactive, telemetry, auto_review, self_eval)
pos-config set k v # Set a setting
pos-analytics # Usage dashboard (events, top skills, sessions)
pos-update-check # Version check with snooze
pos-review-log # Append review results to log
pos-telemetry # Log skill usage events
Persistent State (~/.productionos/)
~/.productionos/
├── config/settings.json # User settings
├── analytics/
│ ├── skill-usage.jsonl # Append-only event log
│ └── review-log.jsonl # Review results history
├── sessions/
│ ├── active-project # Current git repo root (repo boundary guard)
│ └── handoff-{date}.md # Auto-generated session summaries
├── instincts/
│ ├── project/{hash}/ # Project-scoped learned patterns
│ └── global/ # Cross-project patterns (confidence > 0.8)
├── retro/ # Retrospective JSON snapshots + session mini-retros
├── worktrees.json # Active worktree registry
├── review-log/ # Detailed review artifacts
└── cache/ # Version check snooze, temp data
Tech
- 78 agent definitions with YAML frontmatter (model routing, tool constraints, stakes classification)
- 41 commands (including recursive orchestrators and imported workflow surfaces)
- 10-layer prompt architecture (Emotion → Meta → Scratchpad → Context → CoT → ToT → GoT → CoD → Generated Knowledge → Distractor-Augmented)
- Default-on self-evaluation protocol (7-question quality gate on all outputs)
- 17 hook files (SessionStart, PreToolUse security + boundary + gitleaks, PostToolUse telemetry/review, Stop handoff)
- 6 CLI tools for config, analytics, telemetry, version management
- 4 auto-activating skills with file pattern matching
- Executable convergence engine (TypeScript, Algorithm 1 + Algorithm 6)
- Configurable quality gates (quality-gates.yml with per-project overrides)
- Worktree isolation for parallel agent execution (m13v production patterns)
- Persistent state at ~/.productionos/ (config, analytics, instincts, sessions, retro)
- HumanLayer-inspired approval gate for HIGH-stakes operations
- CI/CD pipeline (GitHub Actions: validate + lint + convergence check)
- Secret detection via gitleaks + regex fallback
- Zero runtime dependencies beyond Claude Code or Codex + Bun
Built On
ProductionOS stands on the shoulders of these open-source projects:
- gstack — hooks, CLI tools, preamble pattern, micro-state files
- superpowers — behavioral gates, red flags, evidence-first
- everything-claude-code — continuous learning, instinct system
- 12-factor-agents — small focused agents, unified state
- HumanLayer — stakes classification, approval as tool call
- get-shit-done — wave-based execution, project management
- Fabric — AI prompt patterns, YouTube extraction
- tmux-background-agents — worktree isolation, crash recovery patterns
Research foundations: Self-Refine, Reflexion, Graph of Thought, EmotionPrompt, Chain of Density, Constitutional AI, LLM-as-Judge, DSPy.
Version
1.0.0-beta.1
License
MIT
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found