Agent Project Pack

A drop-in template for running an autonomous coding agent (Claude Code, Gemini CLI, Codex, etc.) 24/7 on your project.

New here? Start with GETTING_STARTED.md — a linear walkthrough from "I have an idea" to "agent is shipping code 24/7." Takes 45–90 minutes the first time.

This README is the project overview and daily workflow reference. Follow GETTING_STARTED.md first.

You scope the project, file issues, walk away. The agent branches, plans, writes tests, implements, opens PRs, and self-merges on green CI. You review asynchronously through GitHub — even from your phone — and steer with @agent PR comments.

This repo is the infrastructure that makes that work safely: runtime-agnostic orchestration (Claude, Gemini, Codex), language-agnostic core, container isolation, governance docs, slash commands, GitHub workflow conventions, and a 24/7 launcher loop. Pick a language stack and feature addons (FastAPI, Next.js, mobile, desktop, CLI) only when you need them — the core is the same regardless.

Why this exists

Running an LLM agent against your codebase is easy. Running one unsupervised for hours without it making a mess is the hard part. The usual failure modes:

Agent goes off on tangents, scope-creeps PRs into unreviewable diffs
No clear handoff when you want to redirect it mid-task
Loses context between sessions, re-litigates settled decisions
No record of what got built or why
Pushes broken code, force-pushes, deletes history, runs rm -rf
You can't tell from your phone whether things are going well

This template fixes all of that with three layers of governance:

Always-loaded instructions (CLAUDE.md) the agent reads every session
Binding rules for autonomous mode (docs/unattended-rules.md) covering the work loop, hard limits, and self-audit behaviour
Physical guardrails — Docker container isolation, Makefile-only command surface, CI gate, async GitHub-mediated communication

Plus the conventions that make it actually pleasant: ADRs for architectural decisions, a plain-English progress log, in-progress labels, a per-module documentation system enforced by CI (docs/codebase/), security governance scaffolding (SECURITY.md, CODEOWNERS, threat model), and a BOOTSTRAP_PROMPT.md you can paste into any AI chat to scaffold a new project from your phone.

What you get

.
├── CLAUDE.md                    # Always-loaded agent context (template)
├── GETTING_STARTED.md           # ⭐ START HERE — linear walkthrough
├── BOOTSTRAP_PROMPT.md          # Paste into AI chat to scaffold a new project
├── REMOTE_SETUP.md              # Phone-only workflow guide
├── STACKS_AND_ADDONS.md         # Catalogue of optional language stacks + feature addons
├── SECURITY.md                  # Vulnerability disclosure policy template
├── Makefile                     # The only command surface (customise per stack)
├── agent.config                 # Runtime + model selection
├── agents/                      # Per-runtime adapters (claude, gemini, codex, custom)
├── docker/                      # Minimal language-agnostic dev container
├── scripts/
│   ├── launch-agent.sh          # The 24/7 launcher loop (runtime-agnostic)
│   └── docs-gate.sh             # CI gate enforcing docs/codebase/ updates per module
├── stacks/                      # OPTIONAL: language toolchains, pick one or more
│   ├── python/
│   ├── node/
│   ├── go/
│   └── rust/
├── addons/                      # OPTIONAL: feature scaffolds, drop in as needed
│   ├── fastapi/                 # Production backend
│   ├── nextjs/                  # Web app
│   ├── mobile-rn/               # React Native cross-platform
│   ├── mobile-native/           # SwiftUI + Kotlin/Compose
│   ├── desktop-tauri/           # Native desktop wrapper
│   ├── cli-tool/                # CLI argument parsing + distribution
│   └── openapi-clients/         # Auto-gen mobile/web clients from API spec
├── .claude/commands/            # Slash commands the agent uses
│   ├── review-prs.md
│   ├── next-issue.md
│   ├── add-adr.md
│   ├── daily-log.md
│   ├── brief-refresh.md
│   ├── phase-status.md
│   └── new-module.md
├── docs/
│   ├── product.md               # Product vision (template)
│   ├── architecture.md          # Technical architecture (template)
│   ├── phases.md                # Build phases (template)
│   ├── unattended-rules.md      # The binding rulebook (runtime-agnostic)
│   ├── codebase.md              # Per-module documentation index
│   ├── codebase/template.md     # Seven-section module doc template
│   ├── decisions/               # ADRs go here
│   └── security/                # Threat model, incident response, supply chain
├── logs/
│   ├── progress.md              # Plain-English changelog (agent maintains)
│   └── daily/                   # Per-session detailed logs
├── plans/                       # Per-deliverable plans (ephemeral)
└── .github/
    ├── CODEOWNERS               # Security-sensitive paths
    ├── workflows/ci.yml         # CI with docs-gate job
    ├── ISSUE_TEMPLATE/
    └── pull_request_template.md

Quick start

Prerequisites

A Linux/macOS host with Docker and Docker Compose
GitHub CLI authenticated (gh auth login)
One of the supported agent CLIs installed and authenticated:
- Claude Code — npm i -g @anthropic-ai/claude-code then claude login (Pro/Max/API)
- Gemini CLI — npm i -g @google/gemini-cli then gemini auth (free tier available)
- Codex CLI — npm i -g @openai/codex then codex login (ChatGPT Plus/Pro/API)
- Or wire your own via agents/custom.sh

Setup

# 1. Use this template (or clone)
gh repo create my-project --template <user>/agent-project-pack --private --clone
cd my-project

# 2. Personalise the templates
#    Replace {{PROJECT_NAME}} placeholders in CLAUDE.md
#    Fill in docs/product.md, docs/architecture.md, docs/phases.md
#    (or use BOOTSTRAP_PROMPT.md with any AI chat to do this from your phone)

# 3. Create the agent's labels
for l in "ready-for-agent:0e8a16" "agent-produced:1f77b4" "agent-please-fix:d93f0b" \
         "agent-proposed:5319e7" "needs-decision:d93f0b" "in-progress:0075ca" \
         "blocked:b60205" "human-only:000000" "human-takeover:000000" \
         "tracking:fef2c0" "roadmap:fef2c0" \
         "priority:high:b60205" "priority:med:fbca04" "priority:low:c2e0c6"; do
  gh label create "${l%:*}" --color "${l##*:}" --force
done

# 4. Customise Makefile test/lint targets for your stack
#    (Python: ruff/mypy/pytest; Node: eslint/tsc/jest; Go: vet/test; Rust: clippy/test)

# 5. Build the dev container
make build && make ci

# 6. Seed 5 ready-for-agent issues with clear acceptance criteria

# 7. Launch
make agent-start

The agent runs until you make agent-stop. Walk away.

Knowing what the agent is doing

Three layers of visibility, all readable from the GitHub mobile app:

1. logs/progress.md — plain-English changelog

The agent appends a human-readable entry here every time it ships something. No jargon, no file paths — written for a non-technical reader. Example:

## 2026-04-25 — Photos now get a quality score automatically
**What it does:** When a photo is imported, Frame scores how sharp and well-exposed it is.
**How:** Analyses pixel data mathematically — no AI model needed, runs instantly.
**Why:** Lets users sort by quality and surfaces the keepers automatically.
**Status:** Merged. PR #14.

Read it from the GitHub web/mobile UI — it's just a markdown file in the repo.

2. in-progress label — what's happening right now

When the agent picks up an issue, it adds the in-progress label and comments "Starting work. Branch: agent/14-quality-score." gh issue list --label in-progress (or filter in the mobile app) shows you exactly what's being worked on at any moment.

3. logs/daily/YYYY-MM-DD.md — detailed session log

Per-day technical log: every issue picked up, every PR opened, every CI failure. Useful for debugging when something goes wrong or auditing the agent's decisions later.

Keeping docs honest as code grows

The hardest problem with long-running agent work is docs drifting from code. Over weeks the agent ships features faster than its own context can stay accurate; eventually it starts making decisions based on stale assumptions.

This template solves it with a CI gate. Every PR that changes <source_root>/<module>/**/*.<ext> must also change docs/codebase/<module>.md. The gate runs on every PR; trivial PRs (typo, log string) bypass via the docs-exempt label.

Each per-module doc follows a strict seven-section template (What it does / Public API / Data tables / Pipeline steps / Routes / Configuration / Notes), so the agent always knows where to look up a module without re-reading its source. This is the single most impactful piece of this template — without it, agents lose coherence over time.

Customise scripts/docs-gate.sh and the DOCS_GATE_* env vars in .github/workflows/ci.yml for your source layout.

Daily workflow

# Day one — start the agent and walk away. It runs continuously.
git checkout main && git pull
make agent-start

# Anytime — check what's happening (works from phone via GitHub mobile app)
cat logs/progress.md            # Plain-English: features shipped, in plain language
gh pr list --state merged       # PRs the agent has shipped
gh issue list --label in-progress  # What it's currently working on
make agent-logs                 # Live tail of today's session

# Steer the agent (works from phone)
# - File issue with `ready-for-agent` label = new task queued
# - Comment "@agent <fix>" + add `agent-please-fix` label = redirect on a PR
# - Add `human-takeover` label = "I'll take this one"

# When you actually want to stop (not at end of day — only when needed)
make agent-stop

The agent is designed to run continuously. Leave it going overnight, weekends, while you travel. It loops every 10 minutes; if the queue is empty it runs a self-audit, proposes new features, and waits. You only stop it when:

You want to update its core docs (CLAUDE.md, unattended-rules.md)
You're rebooting the host
Something has gone wrong and you want to investigate

How it works

The work loop

Every 10 minutes the agent runs one cycle:

Address any open PR with @agent comments or agent-please-fix label
Pick the highest-priority ready-for-agent issue
Mark it in-progress, branch, write a plan
Tests first, then implementation
Run make ci until green
Self-merge with squash + delete branch
Update logs/progress.md
Loop

Empty queue triggers a /brief-refresh self-audit — agent scans docs and code, files agent-proposed issues with feature ideas and refactoring opportunities, then waits 10 min before the next cycle.

Hard limits

The agent can self-merge, refactor, propose features, and challenge architectural decisions via new ADRs. It cannot:

Push directly to main
Force-push or rewrite history
Read paths matching personal data patterns (configurable per project)
rm -rf outside plans/ and logs/
Commit binary user data
Touch PRs labelled human-takeover

CI is the gate that protects everything else — failing tests block self-merge.

Steering from anywhere

The agent reads GitHub fresh every cycle, so anything you do via the GitHub web UI or mobile app reaches it within 10 minutes:

You want to	You do
Queue new work	File issue, label `ready-for-agent` + a priority
Fix something on a PR	Comment `@agent <instruction>`, label `agent-please-fix`
Resolve a blocked decision	Comment your answer, swap `needs-decision` → `ready-for-agent`
Take over a PR	Add `human-takeover` label
Pause everything	`make agent-stop`

Phone-only project bootstrapping

The unique twist: you can scaffold a new project entirely from your phone.

On phone, GitHub mobile app → "Use this template" → create new repo
Open any AI chat (Claude, ChatGPT, etc.)
Paste BOOTSTRAP_PROMPT.md content
Describe your project in plain English; AI asks focused questions
AI produces 5 files (CLAUDE.md, docs/product.md, docs/architecture.md, docs/phases.md, docs/decisions/0001-*.md) plus 5-8 starter issues
Commit each via GitHub mobile's edit view, paste the issues
Later on a laptop: make build && make agent-start

See REMOTE_SETUP.md for the full walkthrough.

Configuration

Running multiple projects in parallel on one machine

The Makefile and launcher derive a unique compose project name from the repo directory, so you can run multiple agents on the same host without collisions:

~/code/project-a $ make agent-start    # container: project-a-agent-1
~/code/project-b $ make agent-start    # container: project-b-agent-1

Each gets its own image (project-a-dev:latest, project-b-dev:latest), container, network, and volumes. They share your host's ~/.claude and ~/.config/gh for auth — that's intentional and fine.

If you also expose HTTP daemons (the optional daemon compose profile), set a different host port per project in each .env:

# project-a/.env
DAEMON_PORT=8765

# project-b/.env
DAEMON_PORT=8766

Override the project name explicitly with PROJECT_NAME=foo make agent-start if your directory names happen to clash.

Choosing the agent runtime

Edit agent.config (or set env vars):

AGENT_RUNTIME=claude    # claude | gemini | codex | junie | custom
AGENT_MODEL=default     # see agents/<runtime>.sh for valid values
AGENT_IDLE_SLEEP=600    # seconds between cycles when queue is empty

Each runtime has an adapter in agents/<name>.sh that handles model selection, CLI flags, and auth checks. The launcher loop is identical regardless of which runtime runs — same prompt, same governance docs, same workflow.

Switching runtimes is one config change: AGENT_RUNTIME=gemini make agent-start. The Docker image has all CLIs preinstalled (configurable via INSTALL_AGENTS build arg if you want a leaner image).

Model recommendations

Runtime	Best model	Cheap+fast
Claude Code	`claude-opus-4-7` (default)	`claude-sonnet-4-6`
Gemini CLI	`gemini-2.5-pro`	`gemini-2.5-flash`
Codex CLI	`gpt-5-codex`	`gpt-5`

Override per-launch:

AGENT_MODEL=sonnet make agent-start          # Claude with Sonnet
AGENT_RUNTIME=gemini AGENT_MODEL=flash make agent-start

Adding a new runtime

Copy agents/custom.sh to agents/<your-runtime>.sh
Implement run_agent_cycle, check_agent_installed, check_agent_authed
Add CLI install to docker/Dockerfile.dev if needed
Set AGENT_RUNTIME=<your-runtime> in agent.config

The contract is small — see agents/claude.sh as the reference implementation.

Adapting to your stack

The core of this template is language-agnostic — the launcher, governance docs, slash commands, docs-gate, security suite, and 24/7 loop all work regardless of what language your project uses.

Language toolchains and feature scaffolds are optional add-ons in two directories:

stacks/ — language toolchains (Python, Node, Go, Rust). Pick one or more.
addons/ — feature scaffolds (FastAPI backend, Next.js web, React Native mobile, Tauri desktop, OpenAPI clients, CLI tools). Drop in only what you need.

See STACKS_AND_ADDONS.md for the full catalogue and apply instructions.

Common combinations:

Project type	Pick
Backend + admin UI	`python` + `node` + `fastapi` + `nextjs`
Mobile-first SaaS	`python` + `node` + `fastapi` + `mobile-rn` + `openapi-clients`
CLI tool	`go` (or `rust`) + `cli-tool`
Photo/video app (premium feel)	`python` + `fastapi` + `mobile-native` + `desktop-tauri` + `openapi-clients`
Static-typed web service	`rust` only

You're not locked in — add an addon later when you need it. Remove one by deleting the directory and reverting its CLAUDE.md block.

The four files customised per project regardless of stack:

docker/Dockerfile.dev — language toolchain
Makefile — test, lint, format commands
.github/workflows/ci.yml — CI matching your Makefile
pyproject.toml / package.json / equivalent — project manifest

Everything else (slash commands, governance docs, launcher) stays as-is.

What this is not

Not an AI orchestration framework. This is configuration + conventions, not code. There's no daemon, no scheduler beyond a bash while/sleep loop. You can read every file in 30 minutes.
Not a Claude wrapper. It uses the official Claude Code CLI in non-interactive mode. Switching to a different agent runner is a one-file change.
Not for production deployment of agents. This runs one agent against one repo on your machine. It's a developer tool, not a multi-tenant platform.
Not magic. If your acceptance criteria are vague, the agent's PRs will be vague. Garbage in, garbage out applies.

Safety: private vs public project repos

The agent only acts on issues labelled ready-for-agent. In a private repo, only you can apply that label, so you're safe by default.

In a public repo, anyone can open issues. The label is still your gate, but you need a workflow that auto-strips ready-for-agent and agent-please-fix from any contribution that isn't yours, otherwise external contributors could queue work directly. See GETTING_STARTED.md for the workflow YAML and the full checklist (don't store production secrets in the agent container, monitor logs in week one, etc.).

The template repo itself (this one) is fine to keep public — there's no agent loop running on it; it's just files.

Cost controls

The agent runs on your subscription or API quota. Three built-in layers protect you from runaway costs:

Daily cost cap — AGENT_MAX_DAILY_USD=5 in agent.config. Launcher pauses for an hour when reached.
Daily PR cap — AGENT_MAX_PRS_PER_DAY=20 for runaway loops.
Per-PR cost comments — every merged PR gets Cost: $0.42 (approx) posted automatically.

Plus make agent-cost shows today's spend at a glance. See GETTING_STARTED.md for recommended values per subscription type and how to inspect spend.

The agent also can't auto-merge changes to its own controls (Makefile, launcher, agent.config, unattended-rules.md, workflows). It labels those PRs human-only-merge and leaves them for you to review.

Comparison

	This template	Devin / similar SaaS	Plain agent CLI
Runs locally	Yes	No	Yes
Self-hosted, your data	Yes	No	Yes
24/7 unattended	Yes	Yes	No (manual)
Container isolation	Yes	Yes	No by default
Phone-driven steering	Yes (via GitHub)	Limited	No
Self-audit + feature proposals	Yes	Varies	No
Customisable rules	Yes (just markdown)	No	Manually each session
Swap underlying model/agent	Yes — config change	No	No
Cost	Your subscription	$$$ subscription	Your subscription
Lock-in	None	High	Tied to one vendor

Roadmap

This is a personal template I use myself, but I'll merge useful PRs. Things that would help:

Pre-built Dockerfile.dev variants for common stacks (Rust, Go, Elixir)
A second launcher mode that wakes only on GitHub webhook (cheaper than the 10-min loop)
A make doctor that diagnoses common setup issues
Translations of unattended-rules.md if non-English-language agents become useful

Contributing

PRs welcome. Keep additions minimal and generic — anything project-specific belongs in your fork, not here.

Hard rule: don't add anything that requires a paid third-party service to use the template. The point is something you can run with just Docker, GitHub, and a Claude subscription.

License

MIT. See LICENSE.

Credits

Built while using Claude Code to build something else, then extracted as a template once it became clear the workflow itself was the more valuable artefact. The unattended rules and slash commands are the result of repeated iteration against a real project — not theoretical.

If you build something interesting with this, open an issue and tell me. I want to know what works and what doesn't.

headless-agentic-codebase