shipwright
Health Gecti
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 13 GitHub stars
Code Uyari
- crypto private key — Private key handling in .claude/hooks/pre-tool-use.sh
Permissions Gecti
- Permissions — No dangerous permissions requested
This tool is an autonomous delivery pipeline that orchestrates AI agent teams to transform labeled GitHub issues into merged pull requests. It manages the entire software delivery lifecycle, from planning and coding to testing and deployment, without human intervention.
Security Assessment
The overall security risk is Medium. The system inherently executes shell commands to run its autonomous pipelines and interact with git repositories. While no hardcoded secrets or dangerous broad permissions were found, there is a notable warning regarding private key handling in a pre-tool hook script (`.claude/hooks/pre-tool-use.sh`). Because the tool is designed to make network requests to GitHub for issue tracking and CI/CD workflows, it accesses potentially sensitive codebases and repository credentials. Users must be highly cautious about repository access controls and should audit how the hook script manages cryptographic keys.
Quality Assessment
The project is actively maintained, with its most recent push occurring today. It is distributed under the standard MIT license, providing clear terms for open-source usage and modification. However, community trust and adoption are currently very low, evidenced by only 13 GitHub stars. The repository does feature a robust CI setup, claiming 141 passing test suites, which indicates structured and tested development practices.
Verdict
Use with caution — the autonomous execution model and private key handling require strict environment isolation and careful credential management before deployment.
Orchestrate fully autonomous Claude Code agent teams. Delivery pipelines, fleet operations, DORA metrics, and auto-scaling workers. From GitHub issue to deployed PR — zero human intervention.
Shipwright
The Autonomous Delivery Platform
From labeled GitHub issue to merged PR — with 18 new autonomous agents orchestrating every step.
Table of Contents
- Shipwright Builds Itself
- Code Factory Pattern
- What's New in v3.2.4
- How It Works
- Install
- Quick Start
- Features
- Commands
- Pipeline Templates for Teams
- Configuration
- Prerequisites
- Architecture
- Contributing
- License
Shipwright Builds Itself
This repo uses Shipwright to process its own issues. Label a GitHub issue with shipwright and the autonomous pipeline takes over: semantic triage, plan, design, build, test, review, quality gates, PR. No human in the loop.
See it live | Create an issue and watch it build.
Code Factory Pattern
Shipwright implements the complete Code Factory control-plane pattern — where agents write 100% of the code and the repo enforces deterministic, risk-aware checks before every merge. Every decision is traceable to policy. Every merge is backed by machine-verifiable evidence.
Agent writes code → Risk policy gate → Tier-appropriate CI → Code review agent
→ Findings auto-remediated → SHA-validated evidence → Bot threads cleaned → Merge
→ Incidents feed back into harness coverage
What makes Shipwright best-in-class
| Code Factory Layer | Shipwright Implementation |
|---|---|
| Single contract | config/policy.json — risk tiers, merge policy, docs drift, evidence specs, harness SLAs in one file |
| Preflight gate | risk-policy-gate.yml classifies risk from changed files before expensive CI runs |
| SHA discipline | All checks, reviews, and approvals validated against current PR head — stale evidence is never trusted |
| Rerun writer | sw-review-rerun.sh — SHA-deduped, single canonical writer, no duplicate bot comments |
| Remediation loop | review-remediation.yml — agent reads findings, patches code, validates, pushes fix to same branch |
| Bot thread cleanup | auto-resolve-threads.yml — resolves bot-only threads after clean rerun, never touches human threads |
| Evidence framework | sw-evidence.sh — browser, API, database, CLI, webhook, and custom evidence with freshness enforcement |
| Harness-gap loop | shipwright incident gap — every regression creates a test case with SLA tracking |
Beyond the baseline
Shipwright extends the Code Factory pattern with capabilities most implementations don't have:
- 12-stage pipeline with self-healing builds, adversarial review, and compound quality gates
- Predictive risk scoring using GitHub signals (security alerts, contributor expertise, file churn)
- Persistent memory — failure patterns, fix effectiveness, and prediction accuracy compound over time
- Auto-learning — self-optimize runs automatically after every pipeline completion, including context efficiency tuning
- Decision engine — tiered autonomous decisions with outcome learning and deduplication
- Unified model routing — single source of truth for model selection across all components
- Evidence-gated merges — SHA discipline ensures all evidence validated against current PR head
- Semantic quality audits — Claude-powered audits with grep fallback when Claude unavailable
- 18 autonomous agents with specialized roles (PM, reviewer, security auditor, test generator, etc.)
- Cross-platform compatibility — portable date helpers, file_mtime, and compat layer for macOS/Linux
- Fleet operations — the Code Factory pattern applied across every repo in your org
- Cost intelligence — per-pipeline cost tracking, budget enforcement, adaptive model routing
- Self-optimization — DORA metrics analysis auto-tunes daemon config and template weights
# Evidence framework — capture and verify all types
npm run harness:evidence:capture # All collectors (browser, API, DB, CLI)
npm run harness:evidence:capture:api # API endpoints only
npm run harness:evidence:capture:cli # CLI commands only
npm run harness:evidence:capture:database # Database checks only
npm run harness:evidence:verify # Verify manifest + freshness
npm run harness:evidence:pre-pr # Capture + verify in one step
# Risk and policy
npm run harness:risk-tier
# Incident-to-harness loop
shipwright incident gap list
shipwright incident gap sla
Full Code Factory documentation
What's New in v3.2.4
Code Factory pattern — deterministic, risk-aware agent delivery with machine-verifiable evidence:
- Risk policy gate — PR-level preflight classifies risk tier from changed files; blocks before expensive CI
- SHA discipline — All evidence validated against current PR head SHA; stale evidence never trusted
- Evidence framework — 6 collector types (browser, API, database, CLI, webhook, custom) with freshness enforcement
- Review remediation — Agent reads review findings, patches code, validates, pushes fix commit in-branch
- Auto-resolve bot threads — Bot-only PR threads cleaned up after clean rerun; human threads untouched
- Harness-gap loop — Every incident creates a test case requirement with SLA tracking (P0: 24h, P1: 72h)
- Policy contract v2 — Risk tiers, merge policy, docs drift rules, evidence specs, harness SLAs in one file
v2.3.1: Autonomous feedback loops, testing foundation, chaos resilience
v2.3.0: Fleet Command completeness overhaul + autonomous team oversight
v2.0.0: 18 autonomous agents, 100+ CLI commands, intelligence layer, multi-repo fleet, local mode
How It Works
graph LR
A[GitHub Issue] -->|labeled 'shipwright'| B[Daemon]
B --> C[Triage & Score]
C --> D[Select Template]
D --> E[Pipeline]
subgraph Pipeline ["12-Stage Pipeline"]
direction LR
E1[intake] --> E2[plan] --> E3[design] --> E4[build]
E4 --> E5[test] --> E6[review] --> E7[quality]
E7 --> E8[PR] --> E9[merge] --> E10[deploy]
E10 --> E11[validate] --> E12[monitor]
end
E --> E1
E12 --> F[Merged PR]
subgraph Intelligence ["Intelligence Layer"]
I1[Predictive Risk]
I2[Model Routing]
I3[Adversarial Review]
I4[Self-Optimization]
end
Intelligence -.->|enriches| Pipeline
style A fill:#00d4ff,color:#000
style F fill:#4ade80,color:#000
When tests fail, the pipeline re-enters the build loop with error context — self-healing like a developer reading failures and fixing them. Convergence detection stops infinite loops. Error classification routes retries intelligently.
Install
One-command install (recommended):
git clone https://github.com/sethdford/shipwright.git && cd shipwright && ./install.sh
Other methods
curl
curl -fsSL https://raw.githubusercontent.com/sethdford/shipwright/main/scripts/install-remote.sh | bash
npm (global)
npm install -g shipwright-cli
Verify
shipwright doctor
Quick Start
# One-command setup
shipwright init
# See what's running
shipwright status
# Process a GitHub issue end-to-end
shipwright pipeline start --issue 42
# Run daemon 24/7 with agent orchestration
shipwright daemon start --detach
# See live agent activity
shipwright activity
# Spin up agent team for manual work
shipwright session my-feature -t feature-dev
# View DORA metrics and pipeline vitals
shipwright dora
# Continuous build loop with test validation
shipwright loop "Build auth module" --test-cmd "npm test"
# Multi-repo operations
shipwright fleet start
shipwright fix "upgrade deps" --repos ~/a,~/b,~/c
# Release automation
shipwright version bump 2.4.0
shipwright changelog generate
Features
18 Autonomous Agents
Wave 1 (Organizational):
- Swarm Manager — Orchestrates dynamic agent teams with specialization roles
- Autonomous PM — Team leadership, task scheduling, roadmap execution
- Knowledge Guild — Cross-team learning, pattern capture, mentorship
- Recruitment System — Talent acquisition and team composition
- Standup Automaton — Daily standups, progress tracking, blocker detection
Wave 2 (Operational Backbone):
- Quality Oversight — Intelligent audits, zero-defect gates, completeness verification
- Strategic Agent — Long-term planning, goal decomposition, roadmap intelligence
- Code Reviewer — Architecture analysis, clean code standards, best practices
- Security Auditor — Vulnerability detection, threat modeling, compliance
- Test Generator — Coverage analysis, scenario discovery, regression prevention
- Incident Commander — Autonomous triage, root cause analysis, resolution
- Dependency Manager — Semantic versioning, update orchestration, compatibility checking
- Release Manager — Release planning, changelog generation, deployment orchestration
- Adaptive Tuner — DORA metrics analysis, self-optimization, performance tuning
- Strategic Intelligence — Predictive analysis, trend detection, proactive recommendations
Plus 10+ specialized agents for observability, UX, documentation, and more.
12-Stage Delivery Pipeline
intake → plan → design → build → test → review → compound_quality → pr → merge → deploy → validate → monitor
Each stage is configurable with quality gates that auto-proceed or pause for approval. 8 pipeline templates:
| Template | Stages | Use Case |
|---|---|---|
fast |
intake → build → test → PR | Quick fixes, score >= 70 |
standard |
+ plan, design, review | Normal feature work |
full |
All 12 stages | Production deployment |
hotfix |
Minimal, all auto | Urgent production fixes |
autonomous |
All stages, all auto | Daemon-driven delivery |
enterprise |
All stages, all gated | Maximum safety + rollback |
cost-aware |
All stages + budget checks | Budget-limited delivery |
deployed |
All + deploy + validate + monitor | Full deploy pipeline |
Intelligence Layer
7 modules that make the pipeline smarter over time. Enabled by default: intelligence is on when Claude CLI is available, with optimization and prediction active out of the box. Set intelligence.enabled=false to disable. All modules degrade gracefully.
| Module | What It Does |
|---|---|
| Semantic Triage | AI-powered issue analysis, complexity scoring, template selection |
| Pipeline Composer | Generates custom pipeline configs from codebase analysis (file churn, test coverage, dependencies) |
| Predictive Risk | Scores issues for risk using GitHub signals (security alerts, similar past issues, contributor expertise) |
| Adversarial Review | Red-team code review — finds security flaws, edge cases, failure modes. Cross-checks against CodeQL/Dependabot alerts |
| Self-Optimization | Reads DORA metrics and auto-tunes daemon config. Includes context efficiency closed loop for token budget tuning |
| Developer Simulation | 3-persona review (security, performance, maintainability) before PR creation |
| Architecture Enforcement | Living architectural model with violation detection and dependency direction rules |
Adaptive everything: thresholds learn from history, model routing uses SPRT evidence-based switching, poll intervals adjust to queue depth, memory timescales tune based on fix effectiveness.
GitHub Deep Integration
Native GitHub API integration enriches every intelligence module:
| API | Integration |
|---|---|
| GraphQL | File change frequency, blame data, contributor expertise, similar issues, commit history |
| Checks API | Native check runs per pipeline stage — visible in PR timeline, blocks merges on failure |
| Deployments API | Tracks deployments per environment (staging/prod), rollback support, deployment history |
| Security | CodeQL + Dependabot alerts feed into risk scoring and adversarial review |
| Contributors | CODEOWNERS-based reviewer routing, top-contributor fallback, auto-approve as last resort |
| Branch Protection | Checks required reviews and status checks before attempting auto-merge |
Decision Engine
The autonomous decision engine (config/policy.json → decision section) handles routine operational decisions with outcome learning. Decisions are tiered by risk, with low-risk actions auto-approved and higher tiers escalated. The engine learns from outcomes to improve future decisions.
Context Engineering
Intelligent context window management for pipeline agents:
- Budget-aware trimming — Configurable character budgets for prompt composition (
context_budget_chars) - Section-level trimming — Independent limits for memory, git history, hotspot files, and test output
- Context efficiency metrics — Tracks budget utilization and trim ratios per iteration
- Self-tuning — The self-optimization loop analyzes context efficiency events and recommends budget adjustments
Autonomous Daemon
shipwright daemon start --detach
Watches GitHub for labeled issues and processes them 24/7:
- Auto-scaling: Adjusts worker count based on CPU, memory, budget, and queue depth
- Priority lanes: Reserve a worker slot for urgent/hotfix issues
- Retry with escalation: Failed builds retry with template escalation (fast → standard → full)
- Patrol mode: Proactively scans for security issues, stale deps, dead code, coverage gaps
- Self-optimization: Tunes its own config based on DORA metrics over time
Fleet Operations
shipwright fleet start
Orchestrate daemons across multiple repositories with a shared worker pool. Workers rebalance based on queue depth, issue complexity, and repo priority.
Persistent Memory
The pipeline learns from every run:
- Failure patterns: Captured and injected into future builds so agents don't repeat mistakes
- Fix effectiveness: Tracks which fixes actually resolved issues
- Prediction validation: Compares predicted risk against actual outcomes, auto-adjusts thresholds
- False-alarm tracking: Reduces noise by learning which anomalies are real
Cost Intelligence
shipwright cost show
Per-pipeline cost tracking with model pricing, budget enforcement, and ROI analysis. Adaptive model routing picks the cheapest model that meets quality targets.
Real-Time Dashboard
shipwright dashboard start
Web dashboard with live pipeline progress, GitHub context (security alerts, contributors, deployments), DORA metrics, cost tracking, and context efficiency metrics. WebSocket-powered, updates in real-time.
Webhook Receiver
shipwright webhook listen
Instant issue processing via GitHub webhooks instead of polling. Register webhook with shipwright webhook register, receive events in real-time, process issues with zero-lag.
PR Lifecycle Automation
shipwright pr review <pr#>
shipwright pr merge <pr#>
shipwright pr cleanup
Fully automated PR management: review based on predictive risk and coverage, intelligent auto-merge when gates pass, cleanup stale branches. Reduces manual PR overhead by 90%.
Fleet Auto-Discovery
shipwright fleet discover --org myorg
Scan a GitHub organization and auto-populate fleet config with all repos matching criteria (language, archived status, team ownership). One command instead of manual registry building.
SQLite Persistence
ACID-safe state management replacing JSON files. Replaces volatile .claude/pipeline-artifacts/ with reliable database schema. Atomic transactions ensure no partial states, crash recovery automatic.
Issue Decomposition
shipwright decompose analyze 42
shipwright decompose decompose 42
AI-powered issue analysis: analyze scores complexity; decompose creates child issues with inherited labels/assignees and a dependency graph.
Linux systemd Support
Cross-platform process supervision. Use systemd on Linux instead of tmux, same daemon commands:
shipwright launchd install # macOS launchd
# systemd service auto-generated on Linux
Context Engine
shipwright context gather
Rich context injection for pipeline stages. Pulls together: contributor history, file hotspots, architecture rules, related issues, failure patterns. Injected automatically at each stage for smarter decisions.
Commands
Over 100 commands. Key workflows:
# Autonomous delivery
shipwright pipeline start --issue 42
shipwright daemon start --detach
# Agent teams
shipwright swarm status
shipwright recruit --roles builder,tester
shipwright standup
shipwright guild list
# Quality gates
shipwright code-review
shipwright security-audit
shipwright testgen
shipwright quality validate
# Observability
shipwright vitals
shipwright dora
shipwright stream
shipwright activity
# Multi-repo operations
shipwright fleet start
shipwright fix "feat: add auth" --repos ~/a,~/b,~/c
shipwright fleet-viz
# Release automation
shipwright version bump 2.4.0
shipwright changelog generate
shipwright deploys list
# Setup & maintenance
shipwright init
shipwright prep
shipwright doctor
shipwright upgrade --apply
# See all commands
shipwright --help
See .claude/CLAUDE.md for the complete 100+ command reference organized by workflow. Full documentation: https://sethdford.github.io/shipwright.
Pipeline Templates for Teams
24 team templates covering the full SDLC:
shipwright templates list
Configuration
| File | Purpose |
|---|---|
config/policy.json |
Central contract — risk tiers, merge policy, docs drift, browser evidence, harness SLAs |
config/policy.schema.json |
JSON Schema validation for the policy contract |
.claude/daemon-config.json |
Daemon settings, intelligence flags, patrol config |
.claude/pipeline-state.md |
Current pipeline state |
templates/pipelines/*.json |
8 pipeline template definitions |
tmux/templates/*.json |
24 team composition templates |
~/.shipwright/events.jsonl |
Event log for metrics |
~/.shipwright/costs.json |
Cost tracking data |
~/.shipwright/budget.json |
Budget limits |
~/.shipwright/github-cache/ |
Cached GitHub API responses |
Prerequisites
| Requirement | Version | Install |
|---|---|---|
| tmux | 3.2+ | brew install tmux |
| jq | any | brew install jq |
| Claude Code CLI | latest | npm i -g @anthropic-ai/claude-code |
| Node.js | 20+ | For hooks and dashboard |
| Git | any | For installation |
| gh CLI | any | brew install gh (GitHub integration) |
Architecture
100+ bash scripts (~100K lines), 125 shell test suites + 16 dashboard test files (141 total), plus E2E system test proving full daemon→pipeline→loop→PR flow. Dashboard at 98% coverage. Bash 3.2 compatible — runs on macOS and Linux out of the box.
Core Layers:
Pipeline Layer
sw-pipeline.sh # 12-stage delivery orchestration
sw-daemon.sh # Autonomous GitHub issue watcher
sw-loop.sh # Continuous multi-iteration build loop
Agent Layer (18 agents)
sw-swarm.sh # Dynamic agent team orchestration
sw-pm.sh # Autonomous PM coordination
sw-recruit.sh # Agent recruitment system
sw-standup.sh # Daily team standups
sw-guild.sh # Knowledge guilds
sw-oversight.sh # Quality oversight board
sw-strategic.sh # Strategic intelligence
sw-scale.sh # Dynamic team scaling
... 10 more agent scripts
Intelligence Layer
sw-intelligence.sh # AI analysis engine
sw-predictive.sh # Risk scoring + anomaly detection
sw-adaptive.sh # Data-driven pipeline tuning
sw-security-audit.sh # Security analysis
sw-code-review.sh # Code quality analysis
sw-testgen.sh # Test generation
sw-architecture.sh # Architecture enforcement
Operational Layer
sw-fleet.sh # Multi-repo orchestration
sw-ci.sh # CI/CD orchestration
sw-webhook.sh # GitHub webhooks
sw-incident.sh # Incident response
sw-release-manager.sh # Release automation
... 20+ operational scripts
Observability Layer
sw-vitals.sh # Pipeline health scoring
sw-dora.sh # DORA metrics dashboard
sw-activity.sh # Live activity streams
sw-replay.sh # Pipeline playback
sw-trace.sh # E2E traceability
sw-otel.sh # OpenTelemetry integration
... observability services
Infrastructure
sw-github-graphql.sh # GitHub GraphQL API client
sw-github-checks.sh # Native GitHub check runs
sw-github-deploy.sh # Deployment tracking
sw-memory.sh # Persistent learning system
sw-cost.sh # Cost intelligence
sw-db.sh # SQLite persistence
sw-eventbus.sh # Async event bus
Tools & UX
dashboard/server.ts # Real-time dashboard
sw-session.sh # tmux agent sessions
sw-status.sh # Team dashboard
sw-docs.sh # Documentation sync
sw-tmux.sh # tmux health management
Contributing
Let Shipwright build it: Create an issue using the Shipwright template and label it shipwright. The autonomous pipeline will triage, plan, build, test, review, and create a PR.
Manual development: Fork, branch, then:
npm test # 125 shell suites + 16 dashboard test files (141 total), E2E system test
License
MIT — Seth Ford, 2026.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi