SWE-Squad

agent
Security Audit
Fail
Health Pass
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 11 GitHub stars
Code Fail
  • rm -rf — Recursive force deletion command in .claude/settings.json
  • process.env — Environment variable access in .pi/extensions/swe-cost.ts
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This project provides an autonomous, always-on AI engineering manager that reads GitHub issues, triages them, writes and reviews code fixes, and merges pull requests. It acts as a persistent daemon connecting a TypeScript control plane with a Python agent library.

Security Assessment
Overall Risk: Medium. Because the tool is designed to autonomously read issues, write code, and merge changes, it naturally handles sensitive data (requiring extensive environment variable access for APIs and databases). The rule-based scan flagged a recursive force deletion command (`rm -rf`) inside the Claude settings. While potentially dangerous if misconfigured, this is likely used for automated workspace cleanup. The tool does not request inherently dangerous operating system permissions, and no hardcoded secrets were detected. However, its core function of executing autonomous code changes requires extensive network access and high-level repository privileges.

Quality Assessment
The project is active, with its last code push occurring today. It operates under the standard, permissive MIT license. Community trust is currently very low; the repository has only 11 GitHub stars, indicating minimal public scrutiny or widespread adoption. Additionally, while the README claims an impressive test suite of over 6,800 tests, the project remains too new and untested by the broader community to be considered highly reliable.

Verdict
Use with caution — the autonomous execution capabilities, recent creation, and low community adoption require strict sandboxing and careful human oversight before deploying in production environments.
SUMMARY

Autonomous Software Engineering Agents — self-healing, self-diagnosing development team powered by Claude Code and A2A protocol

README.md

TypeScript Python 3.10+ pi-agent SDK Claude Code A2A Protocol Tests MIT License

SWE Squad

Autonomous Software Engineering Agents That Fix Bugs While You Sleep

An always-on AI engineering manager backed by a persistent LLM session with 16 custom tools.
Scans GitHub issues, investigates root causes, delegates fixes, reviews PRs, and enforces safety gates — autonomously.

Built on pi-agent SDKClaude CodeSupabaseA2A Protocol


Overview

SWE Squad is an always-on AI engineering manager that runs as a persistent daemon. It:

  1. Imports GitHub issues as structured tickets into a Supabase store
  2. Triages by severity — the LLM decides priority, not hardcoded rules
  3. Investigates root causes by delegating to any configured coding engine
  4. Develops fixes on feature branches with automated test verification
  5. Reviews PRs with structured feedback (security, correctness, style)
  6. Merges approved changes and monitors for regressions
  7. Notifies via Telegram on critical events, PR creation, and failures

The system is built on two codebases:

Layer Language Purpose
Control Plane TypeScript Persistent pi-agent daemon with 16 custom tools — the decision-making brain
Agent Library Python Specialized agents (monitor, triage, investigate, develop), ticket store, embeddings

Key Capabilities

  • 16 Custom Tools — ticket CRUD, GitHub import, investigation/development/review delegation, PR management, workspace provisioning, safety gates, health monitoring, notifications
  • Engine-Agnostic Delegation — swap coding engines (Claude CLI, Gemini CLI, Copilot, OpenCode) via config
  • Provider-Agnostic Architecture — every external service is a swappable plugin behind an interface
  • Persistent Sessions — JSONL-backed session state survives daemon restarts
  • Safety Gates — circuit breaker, stability gate, outcome tracker, budget enforcement
  • Semantic Memory — pgvector embeddings surface similar past fixes at investigation time
  • Multi-Team Support — multiple squads share Supabase without overlap
  • React WebUI — management dashboard with Kanban boards, pipeline editor, team controls

Architecture

The V2 architecture centers on a single persistent LLM session (via @mariozechner/pi-coding-agent) that decides what to do based on its persona and tool results. No hardcoded phases.

flowchart TD
    subgraph daemon [" SWE-Manager Daemon (TypeScript) "]
        Session["pi-agent Session\nPersistent LLM + 16 tools"]
        HB["Heartbeat Loop\n5-min interval"]
        HB -->|"prompt"| Session
    end

    subgraph tools [" Custom Tools "]
        direction LR
        TL["ticket_list\nticket_create\nticket_update"]
        GH["github_issues\ngithub_import"]
        DEL["delegate_investigation\ndelegate_development\ndelegate_review"]
        PR["run_tests\napprove_pr\nmerge_pr"]
        OPS["check_stability\ncheck_health\ncheck_metrics"]
        WS["manage_workspace\nsend_notification"]
    end

    subgraph engines [" Coding Engines (config-resolved) "]
        Claude["Claude Code CLI"]
        Gemini["Gemini CLI"]
        Copilot["GitHub Copilot"]
    end

    subgraph infra [" Infrastructure "]
        Supa[("Supabase\nTickets + pgvector")]
        GitHub["GitHub API\nIssues + PRs"]
        Telegram["Telegram\nNotifications"]
    end

    Session --> tools
    DEL -->|"spawn"| engines
    TL & GH --> Supa
    GH --> GitHub
    WS --> Telegram

    classDef daemonNode fill:#6366f1,stroke:#4338ca,color:#fff,stroke-width:2px,rx:12
    classDef toolNode fill:#3b82f6,stroke:#2563eb,color:#fff,stroke-width:1.5px
    classDef engineNode fill:#ef4444,stroke:#dc2626,color:#fff,stroke-width:1.5px
    classDef infraNode fill:#10b981,stroke:#059669,color:#fff,stroke-width:2px
    classDef subgraphBox fill:transparent,stroke:#e5e7eb,stroke-width:1px,color:#6b7280

    class Session,HB daemonNode
    class TL,GH,DEL,PR,OPS,WS toolNode
    class Claude,Gemini,Copilot engineNode
    class Supa,GitHub,Telegram infraNode
    class daemon,tools,engines,infra subgraphBox

Ticket Pipeline

The daemon flushes right-to-left, completing nearest-done work first:

open → investigating → investigation_complete → in_development → in_review → testing → resolved

Each heartbeat, the LLM picks the highest-priority ticket closest to completion and advances it one step.


How the Fix Loop Works

flowchart TD
    Start(["New Ticket"]):::startNode --> Cache{"Trajectory\ncache hit?"}:::decisionNode

    Cache -->|"hit — free"| Replay["Replay cached fix\nzero cost"]:::cacheNode
    Replay --> Tests0{"Tests\npass?"}:::testNode
    Tests0 -->|"pass"| Keep0(["KEEP — commit"]):::successNode

    Cache -->|"miss"| A1

    subgraph attempts [" Escalating Fix Attempts "]
        A1["Attempt 1 — Sonnet\nRoutine fix"]:::sonnetNode
        A1 --> Tests1{"Tests\npass?"}:::testNode
        Tests1 -->|"pass"| Keep1(["KEEP"]):::successNode
        Tests1 -->|"fail"| A2["Attempt 2 — Sonnet\n+ error context"]:::sonnetNode
        A2 --> Tests2{"Tests\npass?"}:::testNode
        Tests2 -->|"pass"| Keep2(["KEEP"]):::successNode
        Tests2 -->|"fail"| A3["Attempt 3 — Opus\nOrchestrates sub-agents"]:::opusNode
        A3 --> Tests3{"Tests\npass?"}:::testNode
        Tests3 -->|"pass"| Keep3(["KEEP"]):::successNode
        Tests3 -->|"fail"| HITL
    end

    HITL(["HITL Escalation\nTelegram notification"]):::failNode

    Tests0 -->|"fail"| A1

    classDef startNode fill:#6366f1,stroke:#4338ca,color:#fff,stroke-width:2px
    classDef decisionNode fill:#f59e0b,stroke:#d97706,color:#fff,stroke-width:2px
    classDef cacheNode fill:#8b5cf6,stroke:#7c3aed,color:#fff,stroke-width:1.5px
    classDef testNode fill:#64748b,stroke:#475569,color:#fff,stroke-width:1.5px
    classDef sonnetNode fill:#3b82f6,stroke:#2563eb,color:#fff,stroke-width:1.5px
    classDef opusNode fill:#ef4444,stroke:#dc2626,color:#fff,stroke-width:2px
    classDef successNode fill:#10b981,stroke:#059669,color:#fff,stroke-width:2px
    classDef failNode fill:#ef4444,stroke:#dc2626,color:#fff,stroke-width:2px
    classDef subgraphBox fill:transparent,stroke:#e5e7eb,stroke-width:1px,color:#6b7280

    class attempts subgraphBox

Each attempt runs on a git branch. Tests pass = commit + PR. Tests fail = git reset --hard (auto-revert). No broken code ever reaches main.


Quick Start

Prerequisites

  • Node.js 20+ and pnpm (for the TypeScript control plane)
  • Python 3.10+ (for the agent library and tests)
  • Claude Code CLI (coding engine)
  • GitHub CLI (gh) authenticated

1. Install

git clone https://github.com/ArtemisAI/SWE-Squad.git
cd SWE-Squad

# TypeScript control plane
cd control-plane && pnpm install && cd ..

# Python agent library
pip install python-dotenv pyyaml

2. Configure

cp .env.example .env
# Edit .env with your credentials (see Configuration section)

3. Run the Daemon

# Single heartbeat (test your setup)
npx tsx control-plane/src/main.ts --verbose

# Daemon mode (continuous 5-minute heartbeats)
npx tsx control-plane/src/main.ts --daemon --verbose

# Fresh session (discards prior session state)
npx tsx control-plane/src/main.ts --daemon --fresh --verbose

# Dry run (validates config and tool registration, no LLM calls)
npx tsx control-plane/src/main.ts --dry-run

4. Run Tests

# Python tests (5900+ tests)
python3 -m pytest tests/ -v --tb=short

# TypeScript tests (900+ tests)
cd control-plane && pnpm test

# TypeScript type checking
cd control-plane && pnpm typecheck

The 16 Custom Tools

The daemon's LLM session has access to these tools, registered via defineTool() from pi-agent:

Tool Purpose
ticket_list Query tickets by status, severity, repo, or pipeline view
ticket_create Create a new ticket with fingerprint-based deduplication
ticket_update Update ticket status, notes, assignee; enforces resolution audit
github_issues List open GitHub issues from configured repositories
github_import Import GitHub issues as tickets with dedup (fingerprint: gh-issue-{repo}-{number})
delegate_investigation Claim ticket, resolve engine from config, spawn investigation, store report
delegate_development Claim ticket, provision workspace, spawn development, create PR
delegate_review Spawn code review on a PR with structured feedback
run_tests Execute test suite in a workspace and report results
approve_pr Approve a pull request via GitHub API
merge_pr Merge an approved PR (squash merge)
manage_workspace Create/cleanup/list git worktrees for isolated development
check_stability Evaluate safety gates: circuit breaker + open criticals + test failures
check_health Aggregate health snapshot: Supabase, engines, circuit breaker, uptime
check_metrics Pipeline metrics: throughput, cycle time, failure rates
send_notification Send alerts via configured provider (Telegram, Slack, webhook)

Configuration

Environment Variables

Copy .env.example to .env and configure:

Variable Required Description
SWE_TEAM_ENABLED Yes Kill switch (true/false)
SWE_TEAM_ID Yes Unique team identifier for ticket scoping
SWE_GITHUB_ACCOUNT Yes Dedicated GitHub bot account
GH_TOKEN Yes GitHub PAT with repo scope
SUPABASE_URL Yes Supabase PostgREST URL
SUPABASE_ANON_KEY Yes Supabase authentication key
TELEGRAM_BOT_TOKEN No Telegram bot token for notifications
TELEGRAM_CHAT_ID No Telegram chat ID for alerts
BASE_LLM_API_URL No OpenAI-compatible proxy for embeddings
ANTHROPIC_BASE_URL No Proxy URL for Claude CLI (engine delegation)
SWE_DAEMON_MODEL No Override daemon LLM model (default: claude-sonnet)
SWE_MODEL_T2 No Override delegation model tier (default: sonnet)

See .env.example for the full list.

YAML Config (config/swe_team.yaml)

The YAML config controls:

  • delegation — per-role engine binding (investigator, developer, reviewer)
  • workspace — worktree provisioning settings
  • daemon — heartbeat interval, initial prompt, session lifecycle
  • cycle — max concurrent investigations/developments, severity filters
  • memory — embedding model, similarity thresholds, TTL
  • notification — provider selection (telegram/slack/webhook)
  • governance — stability gate thresholds
  • githubRepos — list of repos to scan for issues

Engine Delegation

The daemon never implements directly. It delegates to configured coding engines resolved from config:

# config/swe_team.yaml
delegation:
  investigator:
    engine: claude-cli
    model: sonnet
    readOnly: true
    timeout: 1800
  developer:
    engine: claude-cli
    model: sonnet
    timeout: 3600
  reviewer:
    engine: claude-cli
    model: haiku
    readOnly: true
    timeout: 900

Supported engines: Claude Code CLI, Gemini CLI, OpenCode, GitHub Copilot. Adding a new engine = new file in providers/engine/ + config entry.


Model Routing

Scenario Model Cost
Daemon management cycle Sonnet $$
Investigation (default) Sonnet $$
Development + PR creation Sonnet $$
PR review Haiku $
Embeddings, fact extraction bge-m3 / gemini-3-flash $
CRITICAL bugs Opus $$$
Deterministic replay (cached) None Free
flowchart LR
    Ticket(["Incoming Ticket"]):::startNode --> Cached{"Cached\nfix?"}:::decisionNode

    Cached -->|"hit — free"| Replay(["Replay\nzero cost"]):::cacheNode
    Cached -->|"miss"| Severity{"Severity?"}:::decisionNode

    subgraph tiers [" Model Tiers "]
        direction TB
        T1["T1 Haiku\nEmbeddings, triage\n$"]:::t1Node
        T2["T2 Sonnet\nInvestigation + fix\n$$"]:::t2Node
        T3["T3 Opus\nOrchestrator only\n$$$"]:::t3Node
    end

    Severity -->|"LOW / MEDIUM"| T1
    Severity -->|"HIGH"| T2
    Severity -->|"CRITICAL"| T3
    T2 -->|"2 failures"| T3

    subgraph fallback [" Fallback Chain "]
        direction LR
        Claude["Claude Code\nprimary"]:::claudeNode
        Gemini["Gemini CLI\nfallback"]:::geminiNode
        OpenCode["OpenCode\nlast resort"]:::opencodeNode
        Claude -->|"rate limited"| Gemini -->|"unavailable"| OpenCode
    end

    T2 -.->|"dispatch"| Claude
    T3 -.->|"dispatch"| Claude

    classDef startNode fill:#6366f1,stroke:#4338ca,color:#fff,stroke-width:2px
    classDef decisionNode fill:#f59e0b,stroke:#d97706,color:#fff,stroke-width:2px
    classDef cacheNode fill:#10b981,stroke:#059669,color:#fff,stroke-width:2px
    classDef t1Node fill:#94a3b8,stroke:#64748b,color:#fff,stroke-width:1.5px
    classDef t2Node fill:#3b82f6,stroke:#2563eb,color:#fff,stroke-width:1.5px
    classDef t3Node fill:#ef4444,stroke:#dc2626,color:#fff,stroke-width:2px
    classDef claudeNode fill:#8b5cf6,stroke:#7c3aed,color:#fff,stroke-width:1.5px
    classDef geminiNode fill:#f59e0b,stroke:#d97706,color:#fff,stroke-width:1.5px
    classDef opencodeNode fill:#14b8a6,stroke:#0d9488,color:#fff,stroke-width:1.5px
    classDef subgraphBox fill:transparent,stroke:#e5e7eb,stroke-width:1px,color:#6b7280

    class tiers,fallback subgraphBox

Semantic Memory

When a ticket is resolved, SWE Squad extracts structured facts and stores embeddings in pgvector. On future investigations, the top-5 most similar memories are injected as context.

flowchart TD
    subgraph store [" Storage — on ticket resolved "]
        Resolved(["Ticket Resolved"]):::successNode
        Extract["extract_memory_facts\nroot cause, fix, module, tags"]:::extractNode
        Embed["embed_ticket\nbge-m3 — 1024 dim"]:::embedNode
        Dedup{"Cosine\n> 0.92?"}:::decisionNode
        StoreDB[("Supabase\npgvector")]:::dbNode

        Resolved --> Extract --> Embed --> Dedup
        Dedup -->|"new"| StoreDB
        Dedup -->|"duplicate"| StoreDB
    end

    subgraph retrieve [" Retrieval — on investigation "]
        NewTicket(["New Ticket"]):::startNode
        Search["find_similar\nTop-5, cosine >= 0.75\n180-day TTL"]:::searchNode
        Inject["Inject as\nSemantic Memory context"]:::injectNode

        NewTicket --> Search -->|"query"| StoreDB
        StoreDB -->|"matches"| Inject
    end

    classDef successNode fill:#10b981,stroke:#059669,color:#fff,stroke-width:2px
    classDef startNode fill:#6366f1,stroke:#4338ca,color:#fff,stroke-width:2px
    classDef extractNode fill:#f59e0b,stroke:#d97706,color:#fff,stroke-width:1.5px
    classDef embedNode fill:#8b5cf6,stroke:#7c3aed,color:#fff,stroke-width:1.5px
    classDef decisionNode fill:#f59e0b,stroke:#d97706,color:#fff,stroke-width:2px
    classDef dbNode fill:#3ecf8e,stroke:#2da66e,color:#fff,stroke-width:2px
    classDef searchNode fill:#3b82f6,stroke:#2563eb,color:#fff,stroke-width:1.5px
    classDef injectNode fill:#8b5cf6,stroke:#7c3aed,color:#fff,stroke-width:1.5px
    classDef subgraphBox fill:transparent,stroke:#e5e7eb,stroke-width:1px,color:#6b7280

    class store,retrieve subgraphBox

Plugin Architecture

Every external service is a swappable plugin behind an interface:

Component Interface Default Alternatives
Coding agent CodingEngine Claude Code CLI Gemini CLI, OpenCode, Copilot
Notifications NotificationProvider Telegram Slack, webhook, email
Issue tracker IssueTracker GitHub Issues Jira, Linear, GitLab
Embeddings EmbeddingProvider bge-m3 OpenAI, sentence-transformers
Vector store VectorStore Supabase pgvector Qdrant, Weaviate, Chroma
Task queue TaskQueueProvider In-memory (heapq) Redis, RabbitMQ, SQS
Workspace WorkspaceProvider git-worktree Docker volume, cloud VM
Sandbox SandboxProvider Local subprocess Docker, Codespaces

New provider = new file in providers/<domain>/ + config entry. Nothing else changes.


Project Structure

control-plane/                     # TypeScript V2 control plane
  src/
    main.ts                        # Daemon entry point — pi-agent session + heartbeat
    config/
      schemas.ts                   # Zod schemas for all config sections
      loader.ts                    # YAML + env var config loader
    tools/                         # 16 custom pi-agent tools
      ticket-list.ts               # Query tickets by status/severity/repo
      ticket-create.ts             # Create tickets with fingerprint dedup
      ticket-update.ts             # Update status/notes/assignee
      github-issues.ts             # List GitHub issues
      github-import.ts             # Import issues as tickets
      delegate-investigation.ts    # Spawn investigation via engine
      delegate-development.ts      # Spawn development + PR creation
      delegate-review.ts           # Spawn PR review
      run-tests.ts                 # Execute test suite
      approve-pr.ts                # Approve PR via GitHub API
      merge-pr.ts                  # Merge approved PRs
      manage-workspace.ts          # Git worktree provisioning
      check-stability.ts           # Safety gate evaluation
      check-health.ts              # System health snapshot
      check-metrics.ts             # Pipeline metrics
      send-notification.ts         # Notification dispatch
    providers/                     # Provider implementations
      supabase/                    # Supabase client + ticket store
      notification/                # Telegram, Slack, webhook
      engine/                      # Coding engine registry
      memory/                      # Memory service providers
    safety/                        # Circuit breaker, outcome tracker
    services/                      # Memory service, workspace manager
    shared/                        # Engine resolver, prompt builder, context
    extensions/                    # Tool guard, RBAC, cost tracking
  tests/                           # 900+ vitest tests (unit + integration)

src/swe_team/                      # Python agent library
  monitor_agent.py                 # Log scanning, error detection
  triage_agent.py                  # Severity routing
  investigator.py                  # Root-cause analysis via Claude CLI
  developer.py                     # Keep/discard fix loop
  ralph_wiggum.py                  # Stability gate
  supabase_store.py                # Supabase ticket store
  embeddings.py                    # bge-m3 embeddings + fact extraction
  guardrails.py                    # Safety gate coordinator
  cost_tracker.py                  # Budget enforcement
  atomic_checkout.py               # Cross-VM task dedup
  ...                              # 30+ modules total

src/a2a/                           # A2A inter-agent protocol
  server.py, client.py, dispatch.py

ui/                                # React + Vite management dashboard

scripts/ops/                       # Operational scripts
  swe_team_runner.py               # Legacy Python runner (cron/daemon)
  swe_cli.py                       # CLI tool (status, tickets, reports)
  propagate.sh                     # Code propagation to worker nodes

config/
  swe_team.yaml                    # Runtime configuration
  swe_team/programs/               # Prompt templates (investigate.md, fix.md)

.pi/
  skills/swe-manager/SKILL.md      # LLM persona definition
  extensions/                      # pi-agent extension stubs

tests/                             # 5900+ pytest tests

Multi-Team Deployment

SWE Squad supports multiple teams sharing infrastructure:

Team VM Role Engine
alpha primary Senior: QA, merge authority, critical fixes Claude CLI (direct)
beta worker-1 Development: bulk features, bug fixes Claude CLI (proxy)
gamma worker-2 Economy: investigation, triage Claude CLI (proxy)

Each team has its own team_id scoping all tickets, a dedicated GitHub bot account, and isolated VM.


Safety

  • Circuit Breaker — trips at 80% failure rate, pauses daemon for 30 minutes
  • Stability Gate — blocks new work when critical tickets are open or tests are failing
  • Outcome Tracker — max 3 investigation/development attempts per ticket before HITL escalation
  • Budget Enforcement — per-agent cost tracking with configurable hard-stops
  • RBAC — role-based access control on tool invocations (bypass mode by default)
  • Bot Containment — each bot account is confined to its designated VM

WebUI

The React management dashboard provides:

  • Dashboard — real-time ticket metrics, PR pipeline, severity donut, cost trends
  • Tickets — Kanban board with drag-and-drop, search/filter, detail views
  • Teams — live status indicators, VM connectivity checks, start/stop controls
  • Engines — coding engine management with health checks and BYOK support
  • Pipeline Editor — visual workflow editor built on React Flow
  • Settings — governance thresholds, cycle config, memory settings
cd ui && npm install && npm run dev
# Opens at http://localhost:5173, proxies API to :8888

Requirements

  • Node.js 20+ + pnpm — TypeScript control plane
  • Python 3.10+ — agent library and tests
  • Claude Code CLI — coding engine
  • GitHub CLI (gh) — authenticated for issue + PR management
  • Supabase — ticket store + semantic memory (pgvector)
  • Telegram bot (optional) — notifications
  • SSH access to worker VMs (optional) — remote log collection

Roadmap

  • Persistent pi-agent daemon with 16 custom tools
  • Engine-agnostic delegation (Claude CLI, Gemini CLI, Copilot, OpenCode)
  • Semantic memory with pgvector embeddings + confidence tracking
  • Full ticket pipeline: import, investigate, develop, review, merge
  • Safety gates: circuit breaker, stability gate, outcome tracker
  • React WebUI with Kanban, pipeline editor, team management
  • Multi-team deployment (alpha/beta/gamma squads)
  • Provider-agnostic plugin architecture
  • Interactive Telegram bot — bidirectional chatbot for remote control (#1034)
  • Multi-VM deployment automation
  • npm package: @swe-squad/control-plane
  • Public repo sync and launch
  • Slack/Discord notification plugins
  • Metrics and observability (Prometheus/Grafana)
  • Automated benchmarking suite

Contributing

We welcome contributions! Areas where help is most valuable:

  • Additional coding engine adapters
  • Notification channel plugins (Slack, Discord)
  • Interactive Telegram bot (#1034)
  • New ticket store backends (Redis, SQLite)
  • Agent prompt optimization and benchmarking
  • Documentation and tutorials

License

MIT — use it, fork it, build on it.

Reviews (0)

No results found