.ai-home

agent
SUMMARY

A collection of personal AI coding assistant configurations, specialist agents, and automated workflows optimized for Python and ML open-source development.

README.md

🏠 Borda's .ai-home

Personal AI coding assistant configuration for Python/ML OSS development. Version-controlled, opinionated, continuously improved.

Contents

🎯 Why

Managing AI coding workflows for Python/ML OSS is complex — you need domain-aware agents, not generic chat. This config packages 14 calibrated specialist agents and 14 slash-command skill workflows in a version-controlled, continuously benchmarked setup optimized for:

  • Python/ML OSS libraries requiring SemVer discipline and deprecation cycles
  • ML training and inference codebases needing GPU profiling and data pipeline validation
  • Multi-contributor projects with CI/CD, pre-commit hooks, and automated releases

💡 Design Principles

  • Agents are roles, skills are workflows — agents carry domain expertise, skills orchestrate multi-step processes
  • No duplication — agents reference each other instead of repeating content
  • Profile-first, measure-last — performance skills always bracket changes with measurements
  • Link integrity — never cite a URL without fetching it first (enforced in all research agents)
  • Python 3.10+ baseline — all configs target py310 minimum (3.9 EOL was Oct 2025)
  • Modern toolchain — uv, ruff, mypy, pytest, GitHub Actions with trusted publishing

⚡ Quick Start

# Install Claude Code and Codex CLI
npm install -g @anthropic-ai/claude-code && npm install -g @openai/codex

# Activate config globally
cp -r .claude/ ~/.claude/    # Claude Code agents, skills, hooks
cp -r .codex/ ~/.codex/      # Codex CLI agents and profiles

📦 What's Here

borda.ai-home/
├── .claude/                # Claude Code (Claude by Anthropic)
│   ├── README.md           # full reference: skills, rules, hooks, architecture
│   ├── CLAUDE.md           # workflow rules and core principles
│   ├── settings.json       # permissions and model preferences
│   ├── agents/             # specialist agents
│   ├── skills/             # workflow skills (slash commands)
│   ├── rules/              # per-topic coding and config standards (auto-loaded by Claude Code)
│   └── hooks/              # UI extensions
├── .codex/                 # OpenAI Codex CLI
│   ├── README.md           # full reference: agents, profiles, Claude integration
│   ├── AGENTS.md           # global instructions and subagent spawn rules
│   ├── config.toml         # multi-agent config (gpt-5.3-codex baseline)
│   └── agents/             # per-agent model and instruction overrides
├── .pre-commit-config.yaml
├── .gitignore
└── README.md

🧩 Agents

Specialist roles with deep domain knowledge — requested by name, or auto-selected by Claude Code and Codex CLI.

Agent Claude Codex Purpose
ai-researcher Paper analysis, experiment design, LLM evaluation
ci-guardian GitHub Actions, trusted publishing, flaky test detection
data-steward Dataset versioning, split validation, leakage detection
doc-scribe Google/Napoleon docstrings, Sphinx/mkdocs, changelog
linting-expert ruff, mypy, pre-commit, CI quality gates
oss-shepherd Issue triage, PR review, SemVer, releases, trusted publishing
perf-optimizer Profile-first CPU/GPU/memory/I/O, torch.compile
qa-specialist pytest, hypothesis, mutation testing, ML test patterns
self-mentor Config quality review, duplication detection, cross-ref audit
solution-architect System design, ADRs, API surface, migration plans
squeezer Profile-first optimization, GPU throughput, memory efficiency
sw-engineer Architecture, implementation, SOLID principles, type safety
web-explorer API version comparison, migration guides, PyPI tracking

🤖 Claude Code

Agents and skills for Claude Code (Anthropic's AI coding CLI).

Skills

Skills are multi-agent workflows invoked via slash commands. Each skill composes several agents in a defined topology.

Skill What It Does
review Parallel review across arch, tests, perf, docs, lint, security, API; --reply drafts comment
analyse GitHub thread analysis; health = repo overview + duplicate clustering
brainstorm Interactive spec: clarifying questions → approaches → spec → self-mentor review → approval gate
develop TDD-first features, reproduce-first fixes, test-first refactors, scope analysis, debugging
resolve Resolve PR conflicts or apply review comments via Codex
calibrate Synthetic benchmarks measuring recall vs confidence bias
audit Config audit: broken refs, inventory drift, docs freshness; fix [high|medium|all] auto-fixes by severity; upgrade applies docs-sourced improvements (mutually exclusive)
release Notes, changelog, migration, full prepare pipeline, or readiness audit
research SOTA literature research with implementation plan; plan mode produces a phased, codebase-mapped implementation plan (auto-detects latest research output)
optimize Four modes: plan = config wizard → program.md; campaign = metric-driven iteration loop; resume = continue after crash; perf = profiling deep-dive; --team and --colab supported
manage Create, update (content-edit or rename), delete agents/skills/rules with auto type-detection and cross-ref propagation
sync Drift-detect and sync project .claude/ → home ~/.claude/
codex Delegate mechanical coding tasks to Codex CLI
investigate Systematic diagnosis for unknown failures — env, tools, hooks, CI divergence; ranks hypotheses and hands off to the right skill
session Parking lot for diverging ideas — auto-parks unanswered questions and deferred threads; resume shows pending, archive closes, summary digests the session
distill Suggest new agents/skills, prune memory, consolidate lessons into rules

→ Full command reference, orchestration flows, rules (11 auto-loaded rule files), architecture internals, status line — see .claude/README.md → Skills

Common Workflow Sequences

Skills chain naturally — the output of one becomes the input for the next.

Bug report → fix → validate
/analyse 42            # understand the issue, extract root cause hypotheses
/develop fix 42        # reproduce with test, apply targeted fix
/review                # validate the fix meets quality standards
Performance investigation → optimize → refactor
/optimize src/mypackage/dataloader.py   # profile and fix top bottleneck
/develop refactor src/mypackage/dataloader.py "extract caching layer"  # structural improvement
/review                                 # full quality pass on changes
Code review → fix blocking issues
/review 55             # 7 agent dimensions + Codex co-review
/develop fix "race condition in cache invalidation"  # fix blocking issue from review
/review 55             # re-review after fix
New feature → implement → release
/analyse 87            # understand the issue, clarify acceptance criteria
/develop feature 87    # codebase analysis, demo test, TDD, docs, review
/release               # generate CHANGELOG entry and release notes
New capability → research → implement
/research "efficient attention for long sequences"        # find SOTA methods
/develop feature "implement FlashAttention in encoder"    # TDD-first implementation
/review                                                   # validate implementation
Autonomous metric improvement campaign
/optimize plan "increase test coverage to 90%"    # interactive config wizard → program.md
/optimize campaign "increase test coverage to 90%"  # run 20-iteration loop; auto-rollback on regression
/optimize resume                                    # resume after crash or manual stop
/review                                                    # validate kept commits
Research SOTA → optimize toward metric
/research "knowledge distillation for small models"           # find best approach
/optimize plan "improve F1 from 0.82 to 0.87"                 # configure metric + guard + agent
/optimize campaign  --team                                    # parallel exploration across axes
/review                                                       # quality pass on kept changes
Distill → create → audit → sync
/distill               # analyze work patterns, suggest new agents/skills
/manage create agent my-agent "..."          # scaffold suggested agent
/audit                 # verify config integrity — catch broken refs, dead loops
/calibrate routing     # confirm new agent description doesn't confuse routing
/sync apply            # propagate clean config to ~/.claude/
Delegate mechanical work to Codex
/codex "add Google-style docstrings to all undocumented public functions" "src/mypackage/"
# Codex executes; Claude validates with lint + tests
/review                                   # full quality pass on Codex output
PR review feedback → resolve → verify
/resolve 42   # auto-detect conflicts → resolve semantically → apply review comments via Codex
/review       # full quality pass on all applied changes
OSS contributor PR triage → review → reply

Preferred flow for maintainers responding to external contributions:

/analyse 42 --reply      # assess PR readiness + draft contributor reply in one step

# or if you need the full deep review first:
/review 42 --reply        # 7-agent + Codex co-review + draft overall comment + inline comments table
                          # output: _outputs/YYYY/MM/output-reply-pr-42-<date>.md

# post when ready:
gh pr comment 42 --body "$(cat _outputs/YYYY/MM/output-reply-pr-42-<date>.md)"

Both --reply flags produce the same two-part oss-shepherd output: an overall PR comment (prose, warm, decisive) and an inline comments table (file | line | 1–2 sentence fix). The /analyse path is faster for routine triage; /review path gives deeper findings for complex PRs.

Agent self-improvement loop
/distill                        # analyze work patterns, surface what agents are missing or miscalibrated
/calibrate all fast ab apply    # benchmark all agents vs general-purpose baseline, apply improvement proposals
/audit fix                      # structural sweep after calibrate changed instruction files
/sync apply                     # propagate improved config to ~/.claude/
Agent description drift → routing alignment check

After editing agent descriptions (manually or via /audit fix), verify that routing accuracy hasn't degraded:

/audit                      # Check 12 flags description overlap pairs (static, fast)
/calibrate routing fast     # behavioral test: generates task prompts, measures routing accuracy

Run /calibrate routing fast after any agent description change. Thresholds: routing accuracy ≥90%, hard-problem accuracy ≥80%.

Config maintenance — periodic health check
/audit                 # inspect findings + docs-sourced upgrade proposals — report only, no changes
/audit upgrade         # apply upgrade proposals: config changes verified, capability changes A/B tested
/audit fix             # full sweep + auto-fix critical and high findings
/sync apply            # propagate verified config to ~/.claude/
Keep config current after Claude Code releases
/audit                 # fetches latest Claude Code docs, surfaces applicable improvements as upgrade proposals
/audit upgrade         # applies config proposals (correctness check) and capability proposals (calibrate A/B)
/calibrate all fast    # re-benchmark all agents to confirm no regression from applied changes
/sync apply            # propagate clean, calibrated config to ~/.claude/
Release preparation
/release v1.2.0..HEAD  # generate release notes from git history

🤖 Codex CLI

Multi-agent configuration for OpenAI Codex CLI (Rust implementation). Nine specialist roles on gpt-5.3-codex, auto-selected by task type or addressed by name. See the Agents table above for the full roster with Claude/Codex availability.

Usage

codex                                                          # interactive — auto-selects agents
codex "use the qa-specialist to review src/api/auth.py"        # address agent by name
codex --profile deep-review "full security audit of src/api/" # activate a profile

Install

npm install -g @openai/codex    # install Codex CLI
cp -r .codex/ ~/.codex/         # activate globally

Files

File Purpose
AGENTS.md Global agent instructions, The Borda Standard, spawn rules
config.toml Multi-agent config: 4 runtime profiles, MCP server, sandbox
agents/*.toml Per-agent model and reasoning effort overrides

→ Deep reference: spawn rules, profiles, architecture, and Claude integration — see .codex/README.md → Agents

🤝 Claude + Codex Integration

Claude and Codex complement each other — Claude handles long-horizon reasoning, orchestration, and judgment calls; Codex handles focused, mechanical in-repo coding tasks with direct shell access.

Every skill that reviews or validates code uses a three-tier pipeline: Tier 0 (mechanical git diff --stat gate), Tier 1 (Codex pre-pass, ~60s, diff-focused), Tier 2 (specialized Claude agents). Cheaper tiers gate the expensive ones — this keeps full agent spawns reserved for diffs that actually need them. → Full architecture with skill-tier matrix: .claude/README.md → Tiered review pipeline

Why unbiased review matters / Real example: Claude makes targeted changes with intentionality — it has a mental model of which files are "in scope". Codex has no such context: it reads the diff and the codebase independently. During one session, Claude applied a docstring-style mandate across 6 files and scored its own confidence at 0.88. The Codex pre-pass then found skills/develop/modes/feature.md still referencing the old style — a direct miss. The union of both passes is more complete than either alone.

Two integration patterns make this pairing practical

  1. Offloading mechanical tasks from Claude to Codex

    Claude identifies what needs to change and delegates execution to Codex. Claude keeps its context clean and validates the output.

    /codex "add Google-style docstrings to all undocumented public functions" "src/mypackage/"
    /codex "rename BatchLoader to DataBatcher throughout the package" "src/mypackage/"
    /codex "add return type annotations to all functions missing them" "src/mypackage/utils.py"
    /review   # Claude then reviews with lint + tests
    
  2. Codex reviewing staged work

    After Claude stages changes, Codex serves as a second pass — examining the diff, applying review comments, or resolving PR conflicts. The /resolve skill automates this: it resolves conflicts semantically (Claude) then applies review comments (Codex).

    /resolve 42   # Claude resolves conflicts → Codex applies review comments
    /resolve "rename the `fit` method to `train` throughout the module"
    
Pre-flight requirements
npm install -g @anthropic-ai/claude-code && npm install -g @openai/codex

Without codex: /codex fails at pre-flight; /resolve's review-comment step is skipped (conflict resolution works with Claude alone).

🔄 Config Sync

This repo (.claude/) is the source of truth — home (~/.claude/) is a downstream copy:

/sync          # show what differs between project and home .claude/
/sync apply    # copy all differing files to ~/.claude/

Run after editing any agent, skill, hook, or settings.json. settings.local.json is never synced.

Reviews (0)

No results found