swarm-orchestrator

agent
Security Audit
Warn
Health Pass
  • License — License: ISC
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 70 GitHub stars
Code Warn
  • process.env — Environment variable access in .github/plugin/hooks/hooks.json
  • fs module — File system access in .github/plugin/hooks/hooks.json
  • process.env — Environment variable access in .github/plugin/hooks/post-tool-use.json
  • fs module — File system access in .github/plugin/hooks/post-tool-use.json
  • process.env — Environment variable access in .github/plugin/hooks/pre-tool-use.json
  • network request — Outbound network request in config/default-agents.yaml
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose

This tool acts as a verification and governance layer for AI coding agents (like Copilot, Claude Code, and Codex). Instead of generating code autonomously, it orchestrates external AI agents across isolated git branches and uses evidence-based quality gates—such as build checks, tests, and git diffs—to ensure code safety before merging.

Security Assessment

The tool accesses the file system and reads environment variables via hook configurations, which is expected behavior since it needs to manage git branches and authenticate with external AI APIs (via `ANTHROPIC_API_KEY` or `OPENAI_API_KEY`). It also makes outbound network requests, primarily defined in its default agent configurations, to communicate with these third-party AI services. No hardcoded secrets were detected, and it does not request inherently dangerous OS permissions. Because it orchestrates external AI tools that can write and modify code, it inherently executes shell commands.

Overall Risk Rating: Medium. The tool itself is designed around safety and verification, but the underlying architecture requires handing over local file system access and code execution to third-party AI models.

Quality Assessment

The project is actively maintained (last updated today) and has gained solid early traction with 70 GitHub stars. It is covered by a CI pipeline and boasts an impressive 1,159 passing tests, indicating a high standard of engineering rigor. It uses the permissive ISC license, making it safe for commercial and open-source use.

Verdict

Use with caution — the orchestrator is well-engineered and secure, but you must trust the external AI providers and CLI tools it connects to, as they will actively execute and write code on your machine.
SUMMARY

Verification and governance layer for AI coding agents. Parallel orchestration with evidence-based quality gates for Copilot, Claude Code, and Codex.

README.md

Swarm Orchestrator

Swarm Orchestrator

Verification and governance layer for AI coding agents. Parallel execution with evidence-based quality gates, not autonomous code generation.

This is not an autonomous system builder. It orchestrates external AI agents (Copilot, Claude Code, Codex) across isolated branches, verifies every step with outcome-based checks (git diff, build, test), and only merges work that proves itself. The value is trust in the output, not speed of generation.


License: ISC
  
CI
  
Tests: 1159 passing
  
Node.js 20+
  
TypeScript 5.x


Quick Start · What Is This · Quality Benchmarks · Usage · GitHub Action · Recipes · Architecture · Contributing


Swarm Orchestrator TUI dashboard showing parallel agent execution across waves


Quick Start

# Install globally
npm install -g swarm-orchestrator
# Or clone and build from source
git clone https://github.com/moonrunnerkc/swarm-orchestrator.git
cd swarm-orchestrator
npm install && npm run build && npm link
# Run against your project with any supported agent
swarm bootstrap ./your-repo "Add JWT auth and role-based access control"

# Use Claude Code instead of Copilot
swarm bootstrap ./your-repo "Add JWT auth" --tool claude-code

# Use Codex
swarm bootstrap ./your-repo "Add JWT auth" --tool codex

See it work before pointing it at your code:

swarm demo demo-fast    # two parallel agents, ~1 min

Requires Node.js 20+, Git, and at least one supported agent CLI installed.

Agent Install Auth
GitHub Copilot CLI npm install -g @github/copilot Launch copilot and run /login (requires Node.js 22+)
Claude Code npm install -g @anthropic-ai/claude-code ANTHROPIC_API_KEY
Codex npm install -g @openai/codex OPENAI_API_KEY



What Is This

AI coding agents generate code fast, but without verification, you're merging untested assumptions into your codebase. This orchestrator provides the evidence layer: it runs agents in parallel, checks whether the generated code actually works, and blocks anything that can't prove itself.

What it does: You define a goal. The orchestrator builds a dependency graph, launches steps as dependencies resolve, and manages the full lifecycle: branch creation, agent execution, outcome verification, failure repair, and merge. Every agent runs on its own isolated git branch. Every step is verified by what actually happened: did files change, does the build pass, do tests pass. Steps that can't prove their work don't merge.

What it does not do: This tool does not generate code. It delegates code generation to external agent CLIs (Copilot, Claude Code, Codex) and focuses entirely on orchestration, verification, and quality governance. It is not a replacement for autonomous coding tools; it is a trust layer that wraps them.

Works with Copilot CLI, Claude Code, Codex, or any CLI agent via the adapter interface. Select your tool with --tool globally or per-step in your plan. The orchestrator doesn't care which agent writes the code; it cares whether the code works.

Verification is outcome-based. The engine runs git diff against the branch baseline, executes the project's build and test commands in the worktree, and checks for expected output files. Transcript analysis (parsing what the agent claimed) runs as a supplementary signal, not the primary gate. When a step fails, the RepairAgent receives structured failure context (which checks failed and why, ordered by actionability) instead of blindly retrying the same prompt.

Also available as a GitHub Action for CI/CD integration and with built-in recipes for common tasks.




Quality Benchmarks

The orchestrator's prompt injection and quality gates front-load requirements that developers normally discover through iterative reprompting. The same goal run through the orchestrator produces output that would take 17-25 follow-up prompts to reach with a standalone agent.

The following comparison used the same goal run through Claude Code unassisted and through the orchestrator. Both outputs were evaluated by an independent reviewer against identical criteria.

Goal: "Create a simple browser-based tic-tac-toe game with HTML, CSS, and vanilla JavaScript. Include a 3x3 grid, alternating X and O turns, win detection, and a reset button."

Results: Orchestrator vs Claude Code (unassisted)

Category Claude Code Orchestrator
Architecture A (factory pattern, logic/DOM separation) A+ (pure ES module + DOM controller, new-array-per-move state)
Tests A- (11 tests, custom harness, storage mock required) A+ (19 tests, zero dependencies, edge case + error coverage)
Accessibility F (no ARIA, no focus management, no keyboard support) A+ (skip link, aria-live, positional labels, focus-visible)
Responsive design F (fixed 100px cells, no handling) A (clamp on all sizes, dvh, edge padding)
CSS architecture C (hardcoded colors, no variables, no media queries) A+ (20+ custom properties, dark mode, reduced-motion)
HTML semantics C+ (buttons, no landmarks, no meta tags) A+ (meta description, dual theme-color, SVG favicon, landmarks)
Project scaffolding F (no package.json, no README) A (zero-dep test runner, structured README)
Audio feedback None A (Web Audio API, lazy init, per-event frequencies)

What the orchestrator included that Claude Code did not

17 specific quality attributes were present in orchestrator output and absent from Claude Code output: skip link, aria-live region, positional aria-labels (row/column), focus-visible styles, responsive clamp sizing, CSS custom properties (50+ variable references), prefers-reduced-motion media query, prefers-color-scheme dark mode with full variable overrides, <meta name="description">, dual <meta name="theme-color"> (light and dark), inline SVG favicon, pure logic module separation, copy-on-move game state, audio feedback via Web Audio, separate DOM controller, zero-dependency Node test runner, and structured README with file table.

Each attribute requires at least one follow-up prompt to add when using a standalone agent. Several (full dark mode variable overrides, responsive clamp system, module extraction) require 2-3 rounds. Conservative total: 17-25 prompts eliminated per project.

Note: These results are from a representative run. The underlying agent is non-deterministic, so exact grades and counts may vary between runs. The quality attributes are enforced by prompt injection and gate verification, so they are reliably present, but the specific implementation details (e.g., test count, number of CSS variables) can differ.

How it works

The orchestrator injects quality requirements into every agent prompt before execution begins: accessibility standards (ARIA labels, keyboard navigation, focus-visible, skip links), CSS requirements (custom properties, reduced-motion, color-scheme), HTML metadata (description, theme-color, viewport), and code structure rules (pure logic separation, DOM controller pattern, semantic HTML). Quality gates then verify the output and reject work that doesn't meet the bar, triggering targeted repair with specific failure context.

Standalone agents optimize for "correct and working." The orchestrator adds "accessible, responsive, themed, and structured" before the agent writes a single line. The quality bar comes from the system, not from the user's prompt.

Note: This benchmark covers frontend web projects using Claude Code as the baseline. Copilot CLI and Codex comparisons are in progress and will be added here. Backend, API, and CLI project benchmarks are planned.




Usage

Commands

Command Description
swarm bootstrap ./repo "goal" Analyze repo and generate a plan
swarm run --goal "goal" Generate plan and execute in one step
swarm swarm plan.json Execute a plan with parallel agents
swarm quick "task" Single-agent quick task
swarm use <recipe> Run a built-in recipe against current project
swarm recipes List available recipes
swarm recipe-info <n> Show recipe details and parameters
swarm gates [path] Run quality gates on a project

Key Flags

Flag Effect
--tool <n> Agent to use: copilot (default), claude-code, codex
--governance Enable Critic review wave with scoring and auto-pause
--lean Enable Delta Context Engine (KB-backed prompt references)
--cost-estimate-only Print cost estimate and exit without running
--max-premium-requests <n> Abort if estimated premium requests exceed budget
--wrap-fleet Use Copilot CLI's native /fleet for parallel subagent dispatch
--strict-isolation Restrict cross-step context to verified entries only
--pm Enable PM Agent plan review before execution
--param key=value Set recipe parameters (with use command)
--pr auto|review|none PR behavior after execution

Examples

# Full-featured run with Claude Code
swarm swarm plan.json --tool claude-code --governance --lean

# Recipe: add tests with vitest targeting 90% coverage
swarm use add-tests --tool codex --param framework=vitest --param coverage-target=90

# Preview cost before committing
swarm swarm plan.json --cost-estimate-only

# Per-step agent selection in plan.json
# { "steps": [
#   { "id": 1, "task": "...", "agentName": "BackendMaster", "cliAgent": "claude-code" },
#   { "id": 2, "task": "...", "agentName": "TesterElite", "cliAgent": "codex" }
# ]}



GitHub Action

Run the orchestrator in CI without installing anything. Outcome-based verification provides the trust layer for unattended execution.

Security note: Always pass credentials via the env: block, never via with: inputs. GitHub Actions may expose input values in workflow logs. Always set minimal permissions: to limit GITHUB_TOKEN scope. See SECURITY.md for full credential handling guidance.

name: AI Swarm - Add Tests
on:
  workflow_dispatch:
    inputs:
      goal:
        description: 'What should the swarm do?'
        default: 'Add comprehensive unit tests for all untested modules'

permissions:
  contents: write
  pull-requests: write

jobs:
  swarm:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: moonrunnerkc/swarm-orchestrator@main
        id: swarm
        with:
          goal: ${{ github.event.inputs.goal }}
          tool: claude-code
          pr: review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          # Add other adapter keys as needed:
          # OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
      - name: Check Results
        if: always()
        run: echo "${{ steps.swarm.outputs.result }}"
Input Default Description
goal (required) What the swarm should accomplish
tool copilot Agent CLI: copilot, claude-code, codex
recipe Run a built-in recipe instead of a goal
plan Path to a pre-built plan JSON
pr review PR behavior: auto, review (draft), none
max-retries 3 Max retry attempts per step
model Model to pass to the agent CLI

The Action outputs result (JSON with per-step verification status), plan-path, and pr-url. Session artifacts are automatically redacted for known secret values (API keys, tokens) at the end of every run. The agent CLI must be available in the runner; the Action does not install it. See docs/github-action.md for setup instructions.




Recipes

Reusable, parameterized plans for common tasks. Recipes modify existing projects (unlike templates, which create new ones).

swarm recipes                           # list all
swarm recipe-info add-tests             # show details
swarm use add-tests                     # run with defaults
swarm use add-auth --param strategy=session --tool claude-code
Recipe Steps Description Key Parameters
add-tests 3 Add unit tests for untested modules framework (jest/vitest/mocha), coverage-target
add-auth 4 Add authentication strategy (jwt/session)
add-ci 3 Add GitHub Actions CI pipeline
migrate-to-ts 4 Migrate JavaScript to TypeScript strict (true/false)
add-api-docs 3 Generate OpenAPI spec and docs format (openapi/markdown)
security-audit 3 Run security audit and fix findings
refactor-modularize 4 Break monolithic code into modules

Create custom recipes by adding JSON files to templates/recipes/. See docs/recipes.md for the schema and examples.




Architecture

Goal ──> Plan ──> Waves ──> Branches ──> Agents ──> Verify ──> Repair? ──> Merge
  1. Plan generation. A goal becomes numbered steps with declared dependencies, each assigned to a specialized agent. Plans can be generated from a goal, loaded from a template, run from a recipe, or bootstrapped from repo analysis.

  2. Greedy scheduling. Steps launch the moment their dependencies are satisfied. Adaptive concurrency adjusts based on success rates.

  3. Branch isolation. Each step runs on its own git worktree and branch. With --strict-isolation, cross-step context is restricted to verified entries only.

  4. Agent execution. The orchestrator spawns the selected agent CLI (--tool) as a subprocess, injecting the prompt plus dependency context. Transcripts are captured for supplementary analysis.

  5. Outcome verification. The engine checks what actually happened: git diff against the recorded base SHA, build execution, test execution, and expected file existence. Transcript parsing runs as a secondary signal. Steps must prove their work with outcomes, not claims.

  6. Failure repair. Failed steps are classified (build failure, test failure, missing files, no changes) and retried up to three times. Each retry receives structured failure context: which checks failed, the relevant build/test output, and what to fix. The RepairAgent uses outcome-based root causes, not guesswork.

  7. Merge. Verified branches merge to main. Quality gates check the result for scaffold leftovers, duplicate blocks, hardcoded config, README accuracy, test isolation, runtime correctness, accessibility, and test coverage.

Verification checks
Check Type Required What It Verifies
Git diff git_diff Yes Agent produced file changes vs base SHA
File existence file_existence If specified Expected output files exist in worktree
Build execution build_exec If script exists npm run build (or detected equivalent) passes
Test execution test_exec If script exists npm test (or detected equivalent) passes
Transcript evidence transcript No Agent claimed completion (supplementary)

When outcome checks are present, transcript-based checks are demoted to non-required. A step passes when all required checks pass.

Key modules
Module Responsibility
swarm-orchestrator.ts Greedy scheduler, dependency resolution, merge delegation, cost tracking
worktree-manager.ts Git worktree lifecycle: creation, removal, branch operations
branch-merger.ts Branch merge strategies: rebase-and-merge, conflict resolution, wave merges
verifier-engine.ts Outcome-based verification (git diff, build, test, file existence) + transcript analysis
session-executor.ts Agent adapter integration, AgentResult-to-SessionResult mapping
adapters/ Pluggable agent adapters (copilot, claude-code, codex)
recipe-loader.ts Recipe loading, parameterization, listing
repair-agent.ts Failure classification, targeted retry with outcome context
plan-generator.ts Plan creation, dependency validation, recipe-to-plan conversion
cost-estimator.ts Pre-execution cost prediction with model multipliers
knowledge-base.ts Cross-run pattern storage, recipe run tracking, cost history
Output artifacts
runs/<execution-id>/
  session-state.json          # full execution state (resumable)
  metrics.json                # timing, commit count, verification stats
  cost-attribution.json       # per-step estimated vs actual premium requests
  steps/
    step-N/share.md           # raw agent transcript
  verification/
    step-N-verification.md    # outcome-based pass/fail report



Demos

Six built-in scenarios for verifying your setup or seeing the pipeline end-to-end.

Cost note: Demos run real agent sessions against real APIs. Each step consumes at least one premium request (or API call for Claude Code / Codex). Larger demos with expensive models can use significant budget. For example, saas-mvp with o3 (20x multiplier) could consume 160+ premium requests. Use --cost-estimate-only to preview costs before committing.

swarm demo list
swarm demo-fast          # ~1 min, two parallel agents
swarm demo <n>        # any scenario

# Preview cost before running
swarm demo api-server --cost-estimate-only
Scenario Agents What Gets Built Time Est. Requests (1x model)
demo-fast 2 Two independent utility modules ~1 min 2
dashboard-showcase 4 React + Chart.js dashboard, Express API ~8 min 4-5
todo-app 4 React todo with Express backend ~15 min 4-5
api-server 6 REST API with JWT, PostgreSQL, Docker ~25 min 6-8
full-stack-app 7 Full-stack with auth, E2E tests, CI/CD ~30 min 7-10
saas-mvp 8 SaaS MVP with Stripe, analytics, security ~40 min 8-12



Contributing

npm install && npm run build && npm test

Before submitting a PR: run npm test, run swarm gates ., and keep commits descriptive. TypeScript strict mode, ES2020 target.




License

ISC

Built by Bradley R. Kinnard.

Reviews (0)

No results found