launch-swarm

agent
Security Audit
Warn
Health Warn
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 6 GitHub stars
Code Pass
  • Code scan — Scanned 1 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This is a collection of pre-configured prompt instructions and agent rules designed for Claude Code. It sets up a six-agent AI workflow to automate software development tasks, including sprint planning, writing code, reviewing pull requests, and auto-merging.

Security Assessment
Overall Risk: Low. The automated code scan checked one file and found no dangerous patterns, no hardcoded secrets, and no requests for dangerous system permissions. However, because the primary function of this tool is to autonomously write, execute, and merge code, the inherent risk depends heavily on your local environment. While the ruleset includes safety guardrails (such as banning certain agents from specific actions), auto-merging pull requests after an AI review bypasses standard human verification. You should still carefully inspect the configuration files before running them to ensure the boundaries fit your project's security needs.

Quality Assessment
The project is highly active, with its most recent update pushed today. It is properly open-source under the standard MIT license. However, community trust and visibility are currently very low. With only six GitHub stars, the wider developer community has not yet extensively reviewed, tested, or vetted the tool. It operates as a personal startup workflow shared publicly, rather than a rigorously tested enterprise framework.

Verdict
Safe to use, but you should read the configuration files thoroughly to understand the automated actions before letting the agents run in your environment.
SUMMARY

I run a 4-person startup and code part-time. This .claude folder is how I turn 4 hours into 40. 6 AI agents that write code, review PRs, plan sprints, auto-merge, take screenshots, and watch logs. Each one has rules about what it's banned from doing.

README.md

launch-swarm

License: MIT
Claude Code

4 hours of my time. 40 hours of engineering output. This is the .claude folder.

This is the exact .claude configuration I use daily to run a 6-agent AI engineering swarm from Claude Code. It plans sprints, writes code, reviews PRs, generates documentation with annotated screenshots, auto-merges after dual-gate approval, and hands off to manual QA. Fully autonomous.

I'm the CEO of AgentWeb, a 4-person team building AI marketing agents. This is the production workflow behind a real startup shipping real code every week.

I'm a part-time engineer. I code about 4 hours a day. The rest is sales, marketing, ops. This system turns those 4 hours into 40 hours of engineering output. The reason it works is the same reason real engineering teams work: roles, reviews, and rules about who can touch what. AI without that structure is just faster chaos.

image

Table of Contents


System Architecture

The swarm follows a linear pipeline with two quality gates before anything merges:

                        YOU
                         |
                    "prep sprint"
                         |
                  ┌──────▼──────┐
                  │  Curate 3-5  │
                  │  sprint tasks │
                  └──────┬──────┘
                         |
                   "launch swarm"
                         |
          ┌──────────────▼──────────────┐
          │     SWARM ACTIVE            │
          │                             │
          │  ┌─────────────────────┐    │
          │  │   Task Planner      │    │
          │  │   (architects)      │    │
          │  └────────┬────────────┘    │
          │           │                 │
          │  ┌────────▼────────────┐    │
          │  │   Sprint Worker     │    │
          │  │   (codes + tests)   │    │
          │  └────────┬────────────┘    │
          │           │                 │
          │       Opens PR              │
          │           │                 │
          │  ┌────────▼────────────┐    │
          │  │  GATE 1:            │    │
          │  │  Senior Reviewer    │    │
          │  │  (code review)      │──┐ │
          │  └─────────────────────┘  │ │
          │                           │ │
          │  ┌────────────────────┐   │ │
          │  │  GATE 2:           │   │ │
          │  │  Docs Agent        │◄──┘ │
          │  │  (screenshots +    │     │
          │  │   blog drafts)     │     │
          │  └────────┬───────────┘     │
          │           │                 │
          │  ┌────────▼────────────┐    │
          │  │  PR Watchdog        │    │
          │  │  (auto-merge after  │    │
          │  │   both gates pass)  │    │
          │  └────────┬────────────┘    │
          │           │                 │
          │  ┌────────▼────────────┐    │
          │  │  Log Watcher        │    │
          │  │  (monitors errors)  │    │
          │  └─────────────────────┘    │
          │                             │
          └──────────────┬──────────────┘
                         │
                  Swarm goes idle
                         │
              ┌──────────▼──────────┐
              │  QA Handoff         │
              │  (structured test   │
              │   plan generated)   │
              └──────────┬──────────┘
                         │
                    You test manually
                         │
              ┌──────────▼──────────┐
              │  "prep mega pr"     │
              │  dev → main         │
              │  (CTO reviews)      │
              └─────────────────────┘

The Dual-Gate Rule

Nothing merges until two independent gates pass:

Gate Agent What it checks
Gate 1 Senior Reviewer Code correctness, architecture, security, reliability
Gate 2 Docs Agent Screenshots captured, blog draft written (for features), UI not broken

The PR Watchdog only auto-merges after both gates approve. This catches an entire class of problems that tests alone miss: undocumented features, UI regressions that "work" but look wrong, missing screenshots for stakeholder review.


The 6 Agents

1. Sprint Worker

Role: The coder. Pulls tasks from your sprint board, writes failing tests first, implements the fix, and opens a PR. Trigger: sprint worker or go ham.

Runs up to 3 concurrent instances. You can parallelize across multiple tasks.

What it does every 30 minutes:

  1. Queries your sprint board for To-do tasks in the current sprint
  2. Skips tasks owned by others or marked as blocked
  3. Classifies risk level (Low / Medium / High)
  4. Writes failing tests (minimum 3 failing + 1 passing), then implements until all pass
  5. Opens a PR targeting your dev branch using the Reviewer-First format
  6. Moves the task to In Review and picks up the next one

What it does NOT do:

  • Never takes screenshots (that's the Docs Agent's job)
  • Never self-assigns work from the backlog — only works on tasks you curated into the current sprint
  • Never auto-converts research tasks into code changes

Risk classification drives rigor:

Risk When What's required
Low UI tweaks, CSS, copy changes Unit tests
Medium New pages, frontend logic Unit tests
High External APIs, payments, DB mutations Unit + integration tests + manual QA steps + rollback plan

Domain gating — certain areas require human approval before merge:

  • Payments/billing: Ships with robust test plan, team lead owns
  • Auth (login, tokens, sessions): PR opens but holds for CTO review
  • Core AI/agent infrastructure: PR opens but holds for CTO review
  • Everything else: Ships automatically through the pipeline

Special mode — "Investigate:" tasks:
Tasks prefixed with Investigate: trigger research-only mode. The agent reads code, checks logs, and writes findings — but never writes code or opens PRs. This creates a deliberate pause point where you decide whether to proceed with a fix.

Standalone use: Great on its own if you just want an autonomous coding agent. Pair it with the PR Watchdog for the minimum viable pipeline.


2. Task Planner

Role: The architect. Reads code deeply, generates implementation plans with multiple options, but never writes a single line of code.

What it does every 15 minutes:

  1. Finds To-do tasks in the current sprint that don't have plans yet
  2. Deep-reads the codebase: finds relevant files, checks architecture, maps dependencies
  3. Generates 3-4 implementation options, each with files to change, pros, cons, and risk assessment
  4. Posts the plan as a comment on the task with a recommended approach
  5. Marks the task as Planned so the Sprint Worker knows to follow the plan

What it does NOT do:

  • Never writes code, opens PRs, or modifies files
  • Never plans tasks owned by other team members
  • Never works on Epics (13+ story points) — flags them for breakdown instead

Why separating planning from coding matters:
When the same agent plans and codes, it locks onto the first approach it thinks of. Split the roles and the Planner actually considers multiple options. It hasn't written any code yet, so there's nothing to defend. The Sprint Worker then follows a vetted plan instead of improvising.

Plan format:

## Implementation Plan
### Risk: Low / Medium / High
### Options
**A: [Name]** — files, approach, pros, cons
**B: [Name]** — files, approach, pros, cons
### Recommended: [X]
**Why**: [2-3 sentences]
### Checklist
1. [ ] Failing tests: [specific test cases]
2. [ ] Files to modify: [list]
3. [ ] Verify: [what to test after]
### Dependencies
Backend PR: [yes/no] | Blocked by: [list]

Standalone use: Excellent on its own as a "senior architect on demand." Use it before any complex task to get a second opinion on approach.


3. Senior Reviewer

Role: The code reviewer. Reads the full diff, traces imports two levels deep, checks usage sites, and posts categorized comments.

What it does every 20 minutes:

  1. Finds open PRs that haven't been reviewed yet
  2. Prioritizes by risk level (high-risk PRs reviewed first)
  3. Performs a deep read — minimum 5 minutes per PR:
    • Full diff + entire changed files
    • Imports traced 2 levels deep
    • Usage sites checked for breaking changes
    • Test files reviewed for coverage gaps
  4. Evaluates against five criteria: Architecture, Reliability, Elegance, Security, Compatibility
  5. Posts comments with severity tags and a verdict

What it does NOT do:

  • Never fixes the code — only identifies issues (the Watchdog handles fixes)
  • Never merges PRs
  • Never takes screenshots

Comment severity system:

Tag Meaning Action required
Bug Will break in production Must fix before merge
Nit Style/convention issue Should fix, won't block
Pre-existing Problem existed before this PR Informational, no action
Suggestion Improvement idea Optional

Verdict options: Approve / Needs changes / Rethink approach

An Approve verdict triggers Gate 1. The Docs Agent is next in line.

Standalone use: The highest-value standalone agent. Just install the Senior Reviewer + PR templates and your code review quality goes way up, no full swarm needed.


4. PR Watchdog

Role: The merge bot and janitor. Fixes review comments, syncs branches, resolves conflicts, and auto-merges after both gates pass.

What it does every 10 minutes:

Core loop:

  • Syncs your dev branch with main to prevent drift
  • Resolves merge conflicts. Checks out in a worktree, merges, runs tests, pushes
  • Fixes CI failures. Diagnoses and pushes a fix
  • Fixes review comments. Reads unresolved Bug and Nit comments, makes minimal fixes, runs tests, commits, replies "Fixed"
  • Auto-merges when both gates pass (Senior Reviewer Approve + Docs Documented + CI green + no unresolved bugs)
  • Redeploys test env after each merge. Rsync repos, sync containers, verify freshness

Mega PR ops:

  • Handles CTO feedback on mega PRs. Investigates comments, branches from dev, fixes, opens PRs, replies with fix links
  • Triggers Feature Launch when a mega PR merges to main (fires the Docs Agent's announcement flow)
  • Auto-closes tasks on mega PR merge. Parses sub-PR table, finds linked tasks, marks Done
  • Flags mega PR health. Warns if dev branch has >20 commits or >7 days since last merge to main

Housekeeping:

  • Cross-repo PR linking. Matches PRs across backend and frontend repos, comments links
  • Prunes worktrees for merged branches to prevent disk bloat
  • Flags stale PRs older than 7 days

What it does NOT do:

  • Never merges PRs targeting main — only your dev branch. The mega PR to main requires human approval.
  • Never reviews code quality — that's the Senior Reviewer's job
  • Never takes screenshots

Fix priority: Bug comments before Nit comments. Smallest diff first. If a fix would touch more than 3 files, it skips and flags for human attention. If tests fail after a fix, it reverts.

Standalone use: Useful on its own for any team that wants automated branch syncing, conflict resolution, and review comment fixes. Pairs naturally with the Senior Reviewer.


5. Docs Agent

Role: The documentarian and second quality gate. Takes before/after screenshots, annotates them with red arrows, generates blog drafts, and blocks merges if the UI is broken. Trigger: docs or document prs or ship it.

What it does every 20 minutes:

  1. Finds PRs that have Senior Reviewer approval but no Docs approval yet
  2. Navigates affected pages via Playwright and captures "before" screenshots
  3. Checks out the PR branch and captures "after" screenshots
  4. Annotates "after" screenshots with red SVG arrows pointing at the changed elements
  5. Uploads screenshots to your storage bucket
  6. For feat: PRs, generates a blog draft using your content automation tool
  7. Posts before/after images on the PR
  8. Approves with Documented — or blocks with Documentation blocked if the UI is broken

What it does NOT do:

  • Never reviews code logic — that's the Senior Reviewer's job
  • Never writes code or fixes bugs
  • Doesn't generate docs for backend-only PRs (auto-approves those)

The screenshot annotation technique:

The Docs Agent programmatically injects red arrow SVG overlays pointing at changed elements. This makes it immediately obvious what changed without reading any text:

// Get the bounding box of the changed element
const box = await page.locator('.changed-element').boundingBox();

// Inject a red arrow SVG overlay pointing at the element
await page.evaluate(({x, y, w, h}) => {
  const svg = document.createElement('div');
  svg.style.cssText = 'position:fixed;top:0;left:0;width:100%;height:100%;z-index:99999;pointer-events:none';
  const tipX = x + w/2;
  const tipY = y + h/2;
  const startX = tipX - 80;
  const startY = tipY - 100;
  svg.innerHTML = `<svg width="100%" height="100%" style="position:absolute;top:0;left:0">
    <defs><marker id="arrowhead" markerWidth="10" markerHeight="7"
      refX="10" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="red"/>
    </marker></defs>
    <line x1="${startX}" y1="${startY}" x2="${tipX}" y2="${tipY}"
      stroke="red" stroke-width="5" marker-end="url(#arrowhead)"/>
  </svg>`;
  document.body.appendChild(svg);
}, {x: box.x, y: box.y, w: box.width, h: box.height});

// Screenshot now has the arrow baked in
await page.screenshot({ path: 'after-annotated.png' });

Why docs as a merge gate?
Tests tell you the code works. Reviews tell you the code is correct. Neither one tells you the feature is documented, visible, and makes sense to stakeholders. Make documentation a hard gate and you start catching: undocumented features, UI regressions that "pass" tests but look wrong, and missing context for anyone who wasn't in the PR discussion.

Standalone use: Very useful on its own for any team that wants automated screenshot documentation on PRs. Install the Docs Agent + Playwright MCP for instant visual PR reviews.


6. Log Watcher

Role: The monitor. Tails your test environment logs every minute and reports actionable errors. Never fixes anything.

What it does every 1 minute:

  1. Grabs the last 200 lines of logs from each service
  2. Filters for actionable signals:
    • ERROR, unhandled exceptions, stack traces
    • HTTP 500/502/503 responses
    • Database errors, connection failures
    • Auth failures, token errors
    • Process crashes, OOM kills
  3. Skips noise: deprecation warnings, hot-reload messages, health checks, static asset 304s
  4. If errors are found, prints a clear summary with timestamp, service, error, and stack trace snippet
  5. Correlates errors with recent commits to suggest which change likely caused it
  6. Detects crash loops (same error 3+ ticks in a row) and flags as CRITICAL
  7. If logs are clean, stays silent

What it does NOT do:

  • Never fixes code — just reports
  • Never creates tasks — this is live feedback, not bug filing
  • Never restarts services
  • Never runs browser tests — this is server-side only

Standalone use: Perfect as a "test buddy" during manual testing sessions. Run it with /loop 1m while you're testing your app and get instant feedback on server errors you might miss.


Support Infrastructure (3 additional loops)

Beyond the 6 primary agents, three support loops run in the background:

Loop Interval What it does
Notion Sync 5m Bidirectional glue between GitHub and your task tracker. Open PRs → tasks "In Review". Merged PRs → tasks "Done". Flags stale "In Progress" tasks with no recent commits. Dedup sweep every 5th tick.
Rules Hygiene 12h Checks Claude Code version and changelog, reviews rules for staleness against the Prompt Health tags, proposes updates.
Swarm Digest Daily (8am) Morning briefing: what shipped, mega PR readiness, high-risk items, today's priorities. Delivered to your digest channel.

These aren't agents with standalone use cases. They're infrastructure that keeps the swarm healthy. The Notion Sync is especially critical. Without it, your task tracker and GitHub drift apart within hours.


Key Concepts

Dual-Gate Merge

Most teams gate merges on tests and code review. This system adds a second gate: documentation.

Why? Because in practice, the problems that slip through code review aren't logic bugs. They're:

  • Features that work but nobody documented
  • UI changes that "pass" tests but visually regressed
  • PRs that three people reviewed but nobody screenshotted

The dual-gate pattern catches these. Two agents verify independently from different angles: one reads code, one looks at the UI.

Prompt Health (Self-Learning Anti-Rot)

AI instructions degrade over time. Rules go stale, edge cases pile up, and contradictions sneak in. Most people don't notice until their agent starts acting weird.

This system includes a built-in health system:

Health tags — every rule is mentally tagged as:

  • ACTIVE: Used regularly, still relevant
  • DORMANT: Not triggered in 2+ weeks — candidate for removal
  • STALE: References outdated tools/IDs — needs update or deletion

Self-learning loop — after every mistake:

  1. Check if an existing rule should have prevented it → fix the rule
  2. If no rule exists → add a concise one to lessons.md
  3. If a rule caused the mistake → simplify or delete it

Anti-patterns the system watches for:

  • Same instruction in 2+ files (merge them)
  • "MUST" / "NEVER" / "CRITICAL" inflation — if everything is critical, nothing is
  • Rules written for a one-time incident (delete after resolving)
  • Rules with hardcoded IDs that could go stale (centralize in reference-ids.md)

Risk Classification

Not all code changes deserve the same level of rigor. A CSS tweak doesn't need an integration test. A payment system change needs a rollback plan.

Risk Criteria What's required
Low UI tweaks, copy changes, CSS Unit tests
Medium New pages, components, frontend logic Unit tests
High External APIs, payments, auth, DB mutations Unit + integration + manual QA steps + rollback plan

The Sprint Worker classifies risk automatically based on which files are being changed. High-risk PRs get held for human review regardless of whether both gates pass.

Dev Branch Pattern

Instead of merging feature branches directly to main, everything flows through a dev branch:

feature-1 ──┐
feature-2 ──┤──► dev branch ──► mega PR ──► main
feature-3 ──┘    (auto-merge)   (human review)   (production)

Why this matters:

  • Feature branches merge to dev automatically after gates pass — fast feedback loop
  • dev accumulates a batch of changes, then ships to main as one mega PR — single review point
  • The mega PR gives your CTO/team lead a structured view of everything that shipped, sorted by risk
  • Rollback is clean: revert the mega PR to undo everything, or revert individual feature merges on dev
  • dev becomes your "working production" — if a dependency is merged to dev, it's available for other features

Magic Keywords

Instead of remembering which agents to launch with which parameters, natural language keywords orchestrate complex workflows:

Keyword What happens
prep sprint Queries your task board, proposes 3-5 tasks ranked by impact, waits for your approval
launch swarm Starts all 6 agents at their defined intervals (requires curated sprint)
work horses Starts just Sprint Worker + PR Watchdog (the execution pair)
senior leadership Starts just Task Planner + Senior Reviewer (the quality pair)
ship it Starts just the Docs Agent
prep for deploy Runs tests, opens a PR, reports status
prep mega pr Syncs branches, runs full test suite, opens mega PR with sub-PR table

These work because Claude Code's /loop feature runs agents on recurring intervals. Each keyword maps to a specific combination of /loop commands.

Separation of Concerns

Every agent has strict boundaries. This is intentional and critical:

Agent Does Does NOT
Task Planner Reads code, writes plans Write code, open PRs
Sprint Worker Writes code, opens PRs Take screenshots, review code
Senior Reviewer Reviews code, posts comments Fix code, merge PRs
PR Watchdog Fixes comments, merges PRs Review code quality
Docs Agent Screenshots, blog drafts Fix bugs, review logic
Log Watcher Reports errors Fix code, create tasks

Why this works better than a "do everything" agent:

  • No conflicts. Agents don't step on each other's work.
  • Clear accountability. When something goes wrong, you know which agent's rules to fix.
  • Composable. You can run any subset of agents independently.
  • Debuggable. Each agent's behavior is defined in one file.

Installation

Quick Start (5 minutes)

# Option A: Clone directly as your ~/.claude folder (recommended)
git clone https://github.com/harshmoney123/launch-swarm.git ~/.claude

# Option B: Clone somewhere and copy the files
git clone https://github.com/harshmoney123/launch-swarm.git
cp -r launch-swarm/{CLAUDE.md,rules,settings.json} ~/.claude/

# Option C: Clone into a specific project's .claude
git clone https://github.com/harshmoney123/launch-swarm.git /path/to/your/project/.claude

Configure Your IDs

Open .claude/rules/reference-ids.md and replace the YOUR_* placeholders with your actual IDs:

## Your Task Tracker
- Sprint board: YOUR_COLLECTION_ID
- Current Sprint view: YOUR_CURRENT_SPRINT_VIEW_ID
- Master Table view: YOUR_MASTER_TABLE_VIEW_ID

## Your Communication
- Digest channel: YOUR_SLACK_CHANNEL_ID
- Feature announcements: YOUR_DISCORD_WEBHOOK_URL

## Your Storage
- Screenshot bucket: YOUR_SCREENSHOT_BUCKET

See .private.example/reference-ids.md for the full template with all fields explained.

Private Overrides

For IDs and secrets you don't want in version control:

# Create a .private directory (gitignored)
mkdir -p ~/.claude/.private/

# Copy the example and fill in your real values
cp launch-swarm/.private.example/reference-ids.md ~/.claude/.private/reference-ids.md

Then update your CLAUDE.md to reference .private/reference-ids.md for actual values.

Verify It Works

# Open Claude Code in your project
claude

# Try a magic keyword
> prep sprint

If your task tracker is connected, Claude will query your sprint board and propose tasks. If not, it will tell you what's missing.


What You Need

The swarm requires a few infrastructure pieces. The specific tools don't matter. The concepts do. This section explains what each piece does and why it's needed, with examples of tools that work.

1. A Sprint Board / Task Tracker

Why: The swarm needs a source of truth for "what should I work on?" and "what's the status of each task?" Without this, the Sprint Worker has nothing to pull from and the Watchdog can't update task status on merge.

What it needs to support:

  • A "Current Sprint" view that filters to active tasks
  • Status fields (To-do, In Progress, In Review, Done)
  • Assignee/owner fields (so agents don't touch other people's tasks)
  • An API that Claude Code can access via MCP tools

Options:

Tool MCP Available Notes
Notion (what I use) Yes (native) Great for sprint boards with custom views
Linear Yes Clean API, built for sprint workflows
Jira Yes (community) Enterprise standard, heavier setup
GitHub Projects Yes (via gh CLI) Zero additional tools if you're already on GitHub
Trello Yes (community) Simple but works for small teams

To swap: Update notion-tracking.md and reference-ids.md with your tool's API patterns. The status flow (To-do → In Progress → In Review → Done) and query patterns stay the same.

2. Screenshot / Asset Storage

Why: The Docs Agent needs somewhere to upload before/after screenshots so they can be linked in PR comments and blog posts. Without this, Gate 2 can't function.

What it needs to support:

  • Public-readable URLs for uploaded images
  • Programmatic upload (CLI or API)

Options:

Tool Notes
S3 (what I use) Standard, cheap, public-read ACLs
Cloudflare R2 S3-compatible, no egress fees
GitHub PR attachments Free, no setup, but less control
Imgur API Quick and dirty for personal projects

To swap: Update the upload paths in loop-docs.md and reference-ids.md.

3. A Communication Channel

Why: The daily digest, critical alerts, and feature launch announcements need somewhere to land. Without this, you're checking GitHub manually for pipeline status.

Options:

Tool Notes
Slack (what I use) Native MCP, channels + DMs
Discord Webhook-based, great for small teams
Email Works but slower feedback loop
GitHub Discussions Zero additional tools

To swap: Update channel IDs in reference-ids.md and webhook URLs in loop-docs.md.

4. A Content Automation Tool (for the Docs Agent)

Why: The Docs Agent generates blog drafts for feat: PRs. This requires a tool that can take screenshots and context and produce a user-facing writeup.

Options:

Tool Notes
Emma (what I use) AgentWeb's AI marketing agent. Takes screenshots + context, outputs blog drafts and user guides. Full disclosure: I built this.
Any LLM API Send screenshots + PR context to Claude/GPT and get a draft back
Manual Skip the auto-draft, just require screenshots as the gate

To swap: Update the content tool references in auto-user-guide.md and loop-docs.md.

5. Browser Automation (Playwright MCP)

Why: The Docs Agent needs to navigate your app, capture screenshots, and inject SVG annotations. The Log Watcher can optionally use it for browser-side monitoring.

# Install the Playwright MCP server for Claude Code
# Add to your .claude/settings.json or install via Claude Code plugin marketplace

This is the one dependency that's hard to swap. The screenshot annotation technique is Playwright-specific. If you're not using the Docs Agent, you don't need this.

6. GitHub CLI (gh)

Why: The PR Watchdog, Sprint Worker, and Senior Reviewer all use gh to create PRs, list open PRs, merge PRs, and post comments. This is essential for any agent that interacts with GitHub.

# Install
brew install gh  # macOS
# or see https://cli.github.com/

# Authenticate
gh auth login

7. A Test Environment (Optional)

Why: The Log Watcher monitors a running instance of your app. The Docs Agent captures screenshots from a live environment. Without this, those two agents can't function, but the other four still work fine.

This can be anything: a local dev server, a Docker Compose stack, a staging environment on AWS/Vercel/Fly.io.


Mix and Match

You don't need the full swarm. Each agent is independent. Here are common configurations:

"I just want better code review"

Use: Senior Reviewer + PR templates (loop-pr-ops.md + loop-back-testing.md)

Install just these two files and you get deep automated code review with severity-tagged comments on every PR. No task tracker needed.

"I just want automated documentation"

Use: Docs Agent + Playwright MCP (loop-docs.md)

Install the Docs Agent and get before/after screenshots with annotated arrows on every PR. Great for teams where "nobody screenshots the PR" is a recurring problem.

"I just want an autonomous coder"

Use: Sprint Worker + Task Planner (loop-sprint-worker.md + loop-support.md)

The Planner architects the approach, the Worker codes it. You still review and merge manually. Requires a task tracker.

"I want the minimum viable pipeline"

Use: Sprint Worker + PR Watchdog (loop-sprint-worker.md + loop-pr-ops.md)

This is the "work horses" combo. The Worker codes and opens PRs, the Watchdog fixes review comments and merges. Add the Senior Reviewer if you want automated code review in the loop.

"I want the quality layer without the coding"

Use: Task Planner + Senior Reviewer (loop-support.md + loop-pr-ops.md)

This is the "senior leadership" combo. The Planner architects every task before anyone touches code. The Reviewer catches problems after. You or your team still write the code.

"I want the full swarm"

Use: All 6 agents (launch swarm)

The complete pipeline: plan → code → review → document → merge → monitor. Requires all infrastructure pieces.


Magic Keywords

These are natural language triggers you type into Claude Code to orchestrate the swarm:

prep sprint

What happens:

  1. Claude queries your full task board
  2. Filters out epics, blocked tasks, and tasks owned by others
  3. Proposes 3-5 tasks ranked by impact, with one-line rationales
  4. Waits for your approval — never auto-assigns sprint tasks
  5. Approved tasks get moved to Current Sprint

Why the human gate matters: The swarm should never decide what to work on. It proposes; you decide. This prevents the common failure mode of autonomous agents grinding through low-priority busywork while ignoring what actually matters.

launch swarm

What happens:

  1. Verifies Current Sprint is curated (won't launch on an empty sprint)
  2. Starts all 6 agents at their defined intervals:
    • Sprint Worker: /loop 30m
    • Task Planner: /loop 15m
    • Senior Reviewer: /loop 20m
    • PR Watchdog: /loop 10m
    • Docs Agent: /loop 20m
    • Log Watcher: /loop 1m
  3. Agents begin working through the sprint autonomously

work horses

Starts Sprint Worker + PR Watchdog. The execution pair. Use this when you want code written and merged without the overhead of the full pipeline.

senior leadership

Starts Task Planner + Senior Reviewer. The quality pair. Use this when you want every task planned and every PR deeply reviewed, but you're writing the code yourself.

prep mega pr

What happens:

  1. Syncs your dev branch with main
  2. Runs the full test suite
  3. Verifies all sub-PRs passed both gates
  4. Opens a mega PR from dev → main with a structured description:
    • Sub-PR table sorted by risk
    • High-risk items highlighted
    • Aggregate "What was NOT tested"
    • Rollback instructions

PR Templates

Reviewer-First Format

Every PR opened by the Sprint Worker follows this format:

## Risk: Low / Medium / High | Priority: High / Medium / Low

## What was wrong (before)
[1-2 sentences describing the problem]

## What this fixes (after)
[1-2 sentences describing the solution]

## Test Evidence
- Unit tests: X passed, 0 failed
- Tested as: [which accounts/roles]

## Files changed
[List of files, max 10. Grouped by area if more.]

## What was NOT tested
[Honest list of what wasn't covered]

## Related PRs
- Backend: [#NNN or "standalone"]
- Frontend: [#NNN or "standalone"]

## Task link
[Link to sprint board task]

Why each field exists:

  • Risk/Priority: Tells the reviewer where to focus attention. High-risk PRs get reviewed first.
  • Before/After: Forces the author to articulate the problem and solution in plain language. If you can't explain it in 2 sentences, the PR is too big.
  • Test Evidence: Proves tests actually ran. "Tests pass" means nothing. Which tests? How many?
  • What was NOT tested: This is the most important field. Every PR has blind spots. Making them explicit means the reviewer knows exactly where to look harder, and the QA plan can cover the gaps. Honesty here prevents production surprises.
  • Related PRs: Cross-repo changes (e.g., backend + frontend) need to ship together. This links them.
  • Task link: Bidirectional linking between code and task tracker. Anyone on the task can find the PR; anyone on the PR can find the task.

Mega PR Format

## Mega PR: dev → main

## Summary
[X tasks — Y features, Z fixes. All passed both gates.]

## Sub-PRs (sorted by risk, then priority)
| # | Risk | Priority | Type | Title | PR | Reviewer | Docs | Task |
|---|------|----------|------|-------|----|----------|------|------|
| 1 | High | High | feat | ...   | #N | Pass     | Pass | link |

## High-Risk Items — start here
- **#N — [title]**: [why risky]. Rollback: [how].

## Test Status
- Full suite on dev: X passed
- All sub-PRs: both gates passed

## What was NOT tested
[Aggregate across all sub-PRs]

## Rollback
- Single task: `git revert <merge-commit>` on dev
- Everything: `git revert -m 1 <mega-merge>` on main

Multi-Account Test Matrix

Every PR should be tested from multiple perspectives:

Feature type Test WITH (should work) Test WITHOUT (should be denied)
Admin-only Admin account Basic user → verify 403
Tier-gated Pro/privileged user Basic user → verify upgrade prompt
All-user Basic user (lowest tier) N/A

This ensures you're testing both the happy path AND the access control. A feature that "works" but is visible to users who shouldn't see it is a security bug.


Customization Guide

Swapping Your Task Tracker

The system references Notion throughout, but the pattern is tool-agnostic. To swap:

  1. Update notion-tracking.md — rename it and replace Notion-specific API calls with your tool's equivalents
  2. Update reference-ids.md — replace Notion view IDs with your tool's board/project IDs
  3. Keep the status flow — To-do → In Progress → In Review → Done works regardless of tool
  4. Keep bidirectional linking — every PR links to a task, every task links to its PR

Key behaviors to preserve:

  • The Sprint Worker queries for To-do tasks and claims them by setting In Progress
  • The Watchdog sets tasks to Done when their PR merges
  • The Planner only touches tasks that don't have plans yet (Planned !== true)

Swapping Your Communication Channel

Replace Slack references in reference-ids.md with your channel's webhook URL or API endpoint. The digest format (TL;DR → What shipped → High-risk → Priorities) stays the same.

Adding Your Own Agents

Create a new file in .claude/rules/ following this pattern:

# Agent Name — "trigger keyword"

`/loop INTERVAL`. [One sentence describing the role.]

## Each tick

1. [What to check/query]
2. [What to do with the results]
3. [What to report/update]

## What NOT to do

- [Explicit boundaries]

Then add it to the agents table and combos in loop-modes.md.

Modifying the Gate System

The dual-gate system is defined in loop-modes.md and enforced by the PR Watchdog in loop-pr-ops.md. To change it:

  • Remove a gate: Delete the check from the Watchdog's auto-merge conditions
  • Add a gate: Add a new check (e.g., "Security Scan Passed") to the Watchdog's merge requirements and create the agent that provides it
  • Change gate order: The Docs Agent currently waits for Senior Reviewer approval before starting. To run them in parallel, remove the "skip PRs without Senior Reviewer Approve" rule in loop-docs.md

FAQ

Q: Does this require Claude Code Pro/Max?
A: The swarm uses Claude Code's /loop feature for recurring agents. Check Claude Code's current pricing for loop/cron support. The individual rule files work with any Claude Code tier.

Q: How much does this cost to run?
A: It depends on your loop intervals and how many agents are active. The full swarm with default intervals uses significant context across 6 concurrent loops. Start with 1-2 agents and scale up. The "work horses" combo (Sprint Worker + Watchdog) is the most cost-effective starting point.

Q: Can I use this with Cursor / Codex / other AI coding tools?
A: The rule files are Claude Code-specific (they reference Claude Code features like /loop, EnterWorktree, and subagents). The concepts (dual-gate, risk classification, separation of concerns) apply to any tool. Community ports are welcome via PR.

Q: What if I don't use Notion?
A: See Customization Guide. The system is designed around concepts (sprint board, status flow, bidirectional linking), not specific tools. Swap the Notion references for your tool of choice.

Q: How do I prevent the agents from breaking my codebase?
A: Multiple layers of protection:

  • Sprint Worker always runs tests before opening a PR
  • Senior Reviewer does deep code review before approving
  • PR Watchdog only merges after both gates pass
  • Nothing auto-merges to main — only to your dev branch
  • Mega PRs require human approval
  • High-risk PRs are held for manual review regardless of gates

Q: Can I run this with a team, not just solo?
A: Yes. I run it on a 4-person team. The owner filter (agents only touch tasks assigned to you) prevents conflicts. Each team member can run their own swarm instance on their own tasks. The PR Watchdog handles cross-agent coordination (merge conflicts, branch syncing).

Q: My prompts keep going stale. How does prompt health actually work?
A: See Prompt Health. The short version: every rule gets tagged ACTIVE/DORMANT/STALE. After any mistake, you check whether a rule should have prevented it and fix or add one. The Rules Hygiene agent reviews all rules every 12 hours and flags drift. This turns prompt maintenance from "maybe I'll update it someday" into a systematic process.

Q: What's the "Investigate:" task pattern?
A: Prefix any task with "Investigate:" and the Sprint Worker switches to research-only mode — reads code, checks logs, writes findings, but never opens a PR. This creates a deliberate pause point where you review the findings before deciding to proceed. It prevents the common failure of agents charging ahead with a fix for a problem they don't fully understand.


Priority System

There are two different orderings for two different purposes:

Sprint Curation Order ("prep sprint")

When deciding what to work on, prioritize by business value with least friction:

High Priority + Low Risk   →  first  (ship value fast, low blast radius)
High Priority + Med Risk   →  second
Med Priority  + Low Risk   →  third
High Priority + High Risk  →  fourth
Med Priority  + Med Risk   →  fifth
...

You want to deliver the most impactful stuff with the least chance of blowing up first.

Execution/Review Triage Order (Sprint Worker, Senior Reviewer)

When deciding what to review or merge next from the active queue, prioritize by risk:

High Risk + Low Priority  →  first  (get scary stuff eyeballed early)
High Risk + Med Priority  →  second
Med Risk  + Low Priority  →  third
High Risk + High Priority →  fourth
Med Risk  + Med Priority  →  fifth
Med Risk  + High Priority →  sixth
Low Risk  + Low Priority  →  seventh
Low Risk  + Med Priority  →  eighth
Low Risk  + High Priority →  ninth

Why risk-first for execution: High-risk changes are the ones most likely to cause problems. Getting them reviewed early gives you more time to catch issues before they compound. A low-priority database migration deserves more scrutiny than a high-priority copy update.

Tiebreaker: CI green first, fewer unresolved review comments first.


Sprint Lifecycle

The complete lifecycle from planning to production:

1. "prep sprint"     →  You curate 3-5 tasks
2. "launch swarm"    →  6 agents start working
3. Sprint Worker     →  Codes + tests + opens PRs
4. Senior Reviewer   →  Reviews PRs (Gate 1)
5. Docs Agent        →  Screenshots + blog drafts (Gate 2)
6. PR Watchdog       →  Auto-merges after both gates
7. Swarm goes idle   →  QA handoff generated automatically
8. You test manually →  Using the structured QA plan
9. "prep mega pr"    →  Opens dev → main PR
10. CTO/lead reviews →  Human approval on mega PR
11. Merge to main    →  Feature launch announced

The swarm knows when to stop. When all Current Sprint tasks are done, the Sprint Worker goes idle. The system detects this and automatically generates a structured QA handoff with step-by-step test plans for every task that shipped. It doesn't keep grinding or pull from the backlog.

Between sessions: Clear Done tasks from Current Sprint before curating new ones. This prevents the sprint view from accumulating stale completed items that confuse the agents.


File Reference

File Purpose
.claude/CLAUDE.md Core rules: git workflow, TDD, lifecycle, magic keywords, priority matrix
.claude/rules/loop-modes.md Agent definitions, intervals, combos, gates, sprint complete flow
.claude/rules/loop-sprint-worker.md Sprint Worker: coding agent with risk classification
.claude/rules/loop-pr-ops.md PR Watchdog + Senior Reviewer + Daily Digest
.claude/rules/loop-docs.md Docs Agent: screenshots, annotations, blog drafts, feature launches
.claude/rules/loop-log-watcher.md Log Watcher: real-time error monitoring
.claude/rules/loop-support.md Task Planner + Notion Sync + Rules Hygiene
.claude/rules/loop-back-testing.md PR templates: Reviewer-First and Mega PR formats
.claude/rules/session-hygiene.md Context management: compact, clear, commit, prune
.claude/rules/prompt-health.md Self-learning anti-rot system
.claude/rules/notion-tracking.md Task tracking discipline and querying patterns
.claude/rules/reference-ids.md All hardcoded IDs in one place (template)
.claude/rules/auto-user-guide.md Auto-generated user guides for feature PRs
.claude/rules/meeting-action-routing.md Meeting action item extraction and routing
.claude/rules/loop-senior-leadership.md Combo: Task Planner + Senior Reviewer
.claude/rules/loop-work-horses.md Combo: Sprint Worker + PR Watchdog
.claude/settings.json Claude Code settings (Playwright plugin)
.private.example/reference-ids.md Example private config with placeholder values

Contributing

This is a living system. I update it as I find better patterns.

Ways to contribute:

  • Bug fixes: If a rule has a logic error or contradicts another rule, open a PR
  • New agents: Built a useful agent? Add it to .claude/rules/ with proper boundaries
  • Tool adapters: Made this work with Linear/Jira/GitHub Projects? Share the modified files
  • Concept improvements: Better gate systems, new health check patterns, improved priority algorithms

Please don't:

  • Add rules that are specific to your project (keep it general)
  • Add emojis or formatting flourishes to the rule files (Claude reads these, not humans)
  • Combine agents that should stay separate (separation of concerns is core to the design)

Credits

Built by Harsha Vankayalapati, Co-Founder & CEO at AgentWeb. YC W23 founder. Part-time coder, full-time CEO.

This system was built because I needed to stay dangerous as an engineer while running sales, marketing, and ops for a 4-person startup. Every rule exists because something broke without it. Every boundary exists because an agent overstepped without it.

The Docs Agent in this system uses Emma (AgentWeb's AI marketing agent) for automated blog drafts and user guide generation. We built it. Obviously I'm biased, but it works well for dev teams that want content without context-switching.


License

MIT. Use it, fork it, adapt it, share it.

If this helped you, a star helps others find it.

Reviews (0)

No results found