StatsClaw

Name: statsclaw
Author: statsclaw

A workflow framework for statistical package development.

An open-source tool that helps researchers build, test, and document statistical software packages with AI agent teams.

Website · Roadmap · Contributing · Discussions

What is StatsClaw?

StatsClaw is a framework for Claude Code that uses AI agent teams to assist with statistical package development. You describe what you need — a bug fix, a new feature, a cross-language translation — and StatsClaw coordinates multiple AI agents to help you build, test, and document the result. It works best when a domain expert stays in the loop to guide decisions.

How It Works

StatsClaw orchestrates a team of 9 specialized AI agents, each operating under strict information isolation:

Agent	Role
Leader	Orchestrates the workflow, dispatches agents, enforces isolation
Planner	Reads your paper/formulas, executes deep comprehension protocol, produces specifications
Builder	Writes source code from `spec.md` (never sees the test spec)
Tester	Validates independently from `test-spec.md` (never sees the code spec)
Simulator	Runs Monte Carlo studies from `sim-spec.md` (never sees either spec)
Scriber	Documents architecture, generates tutorials, maintains audit trail
Distiller	Extracts reusable knowledge for the shared brain (brain mode only)
Reviewer	Cross-checks all pipelines, audits tolerance integrity, issues ship/no-ship verdict
Shipper	Commits, pushes, opens PRs, handles package distribution

The code, test, and simulation pipelines are fully isolated — they never see each other's specs. If all pipelines converge independently, confidence in correctness is high. This is adversarial verification by design.

Multi-Pipeline Architecture

                      planner (bridge)
                     /    |          \
          spec.md   / test-spec.md    \  sim-spec.md
                   /      |            \
            builder ─ ─(parallel)─ ─ simulator
       (code pipeline)    |    (simulation pipeline)
                   \      |            /
      implementation.md   |   simulation.md
                    \     |          /
                     \    v         /
                       tester           <-- sequential, after merge-back
                    (test pipeline)
                         |
                      audit.md
                         |
                    scriber (recording)
                         |
                    distiller (brain mode only)
                         |
                    reviewer (convergence)
                          |
                        shipper

Key properties:

Planner is always mandatory — it bridges all pipelines
Builder handles code, scriber handles docs, simulator handles Monte Carlo studies — for docs-only requests, scriber replaces builder as implementer
Builder and simulator run in parallel (simulation workflows), then tester validates the merged result — each pipeline has its own isolated spec
Pipeline isolation is enforced — each pipeline never sees another's spec
Adversarial verification — if all pipelines converge independently, confidence is high

Supported Languages

R	Python	Stata	TypeScript	Go	Rust	C	C++

More languages coming — Julia is next! Want another? Let us know.

Quick Start

Prerequisites

Claude Code — Install Claude Code
GitHub access — Push access to your target repository
Workspace repo — A GitHub repo for storing workflow artifacts (auto-created if needed)

Your First Task

Just tell StatsClaw what you want. It auto-detects the language, selects the right workflow, and starts working:

work on https://github.com/your-org/your-package resolve the issues

StatsClaw will auto-detect the language, select a workflow, and start working. It will ask you clarification questions when it encounters ambiguity — your domain expertise guides the process. Results vary depending on task complexity; expect to iterate.

Workflow

Code:            leader → planner → builder → tester → scriber → [distiller]? → reviewer → shipper?
Docs-only:       leader → planner → scriber → reviewer → shipper?
Simulation+Code: leader → planner → [builder ∥ simulator] → tester → scriber → [distiller]? → reviewer → shipper?
Simulation-only: leader → planner → simulator → tester → scriber → [distiller]? → reviewer → shipper?

States: CREDENTIALS_VERIFIED → NEW → PLANNED → SPEC_READY → PIPELINES_COMPLETE → DOCUMENTED → [KNOWLEDGE_EXTRACTED]? → REVIEW_PASSED → READY_TO_SHIP → DONE

Signals: HOLD (ambiguous, ask user), BLOCK (validation failed), STOP (unsafe to ship)

What Can StatsClaw Help With?

Task	How it helps	Limitations
Implementing methods	Assists with translating specs into code	Requires researcher to validate mathematical correctness
Cross-language translation	Handles R/Python idiom differences	May miss subtle numerical edge cases without careful review
Testing & validation	Independent test pipeline catches bugs tests miss	Empirical verification, not formal proofs
Monte Carlo studies	Automates simulation harness and reporting	Researcher must design meaningful DGPs and metrics
Paper-driven features	Reads methodology papers to design new functionality	Extracts concepts, not full estimator implementations
Bug fixing	Adversarial architecture helps find hidden bugs	Complex domain bugs still need human insight
Documentation	Generates Quarto books, API docs	Needs researcher review for accuracy

Example Prompts

# Fix a specific issue
fix issue #42 in my-package

# Build from scratch
build a Python package from this R code

# Cross-language migration
rewrite the Python backends in pure R and ship it

# Simulation study
run a Monte Carlo study comparing these three estimators

# Paper to package
build the R works from this PDF

# Paper-driven feature
read Correia (2016) and add network visualization to panelView

# Documentation
update the documentation for v2.0

# Contribute knowledge to the shared brain
/contribute

Learn by Example

We provide examples from our own usage. Each is a real repository you can inspect and learn from. Your mileage may vary — these represent what worked for us with active researcher involvement.

Example	Repo	What it demonstrates
Iterative refactoring (1 to 2)	`statsclaw/example-fect`	Multi-day, researcher-guided refactoring of an R package
Python from R source (0 to 1)	`statsclaw/example-R2PY`	Building a Python package from an R reference
Paper to package + Monte Carlo	`statsclaw/example-probit`	PDF manuscript to R/C++ package + simulation
Paper-driven feature addition	`statsclaw/example-panelView`	Reading a methodology paper to design a new feature

See the workspace example for the actual workflow artifacts produced during these examples.

What You Install

CLAUDE.md — orchestration policy (the authoritative reference)
agents/ — agent definitions (leader, planner, builder, tester, simulator, scriber, distiller, reviewer, shipper)
skills/ — shared protocol skills (credential-setup, isolation, handoff, mailbox, issue-patrol, profile-detection, brain-sync, privacy-scrub)
profiles/ — language-specific execution rules (R, Python, TypeScript, Stata, Go, Rust, C, C++)
templates/ — runtime artifact templates and repo scaffolding (brain-repo, brain-seedbank-repo)

Agent Teams is enabled at the project level through .claude/settings.json.

Runtime Layout

All runtime state lives inside the workspace repo, organized per target repository:

.repos/
├── <target-repo>/                    # target repo checkout
├── brain/                            # statsclaw/brain clone (brain mode only)
├── brain-seedbank/                   # statsclaw/brain-seedbank clone (brain mode only)
└── workspace/                        # workspace repo (GitHub)
    └── <repo-name>/                  # per-target-repo runtime + logs
        ├── context.md                # active project context
        ├── CHANGELOG.md              # timeline index of all runs (pushed)
        ├── HANDOFF.md                # active handoff (pushed)
        ├── ref/                      # reference docs for future work (pushed)
        ├── runs/
        │   └── <request-id>/         # per-run artifacts
        │       ├── credentials.md    # push access verification
        │       ├── request.md        # scope and acceptance criteria
        │       ├── status.md         # state machine
        │       ├── impact.md         # affected files and risk areas
        │       ├── comprehension.md  # comprehension verification (from planner)
        │       ├── spec.md           # code pipeline input (from planner)
        │       ├── test-spec.md      # test pipeline input (from planner)
        │       ├── sim-spec.md       # simulation pipeline input (from planner, workflows 11/12)
        │       ├── implementation.md # code pipeline output (from builder)
        │       ├── simulation.md     # simulation pipeline output (from simulator, workflows 11/12)
        │       ├── audit.md          # test pipeline output (from tester)
        │       ├── ARCHITECTURE.md   # from scriber (primary copy in target repo root)
        │       ├── log-entry.md      # process record (from scriber; promoted to runs/<date>-<slug>.md)
        │       ├── docs.md           # documentation changes (from scriber)
        │       ├── brain-contributions.md  # knowledge entries (from distiller, brain mode only)
        │       ├── review.md         # convergence verdict (from reviewer)
        │       ├── shipper.md        # ship actions (from shipper)
        │       ├── mailbox.md        # inter-teammate communication
        │       └── locks/            # write surface locks
        ├── logs/                     # diagnostic logs
        └── tmp/                      # transient data

Repository Layout

StatsClaw/
├── CLAUDE.md           # orchestration policy
├── README.md
├── agents/             # agent definitions (9 agents including distiller)
├── skills/             # shared protocol skills (13 skills including brain-sync, privacy-scrub)
├── profiles/           # language execution rules (8 languages)
├── templates/          # runtime artifact templates + repo scaffolding (brain-repo, brain-seedbank-repo)
└── .repos/             # target repo checkouts + workspace + brain repos (runtime state, git-ignored)

Workspace Repository

Workflow logs, process records, and handoff documents are NOT stored in target repos. Instead, they are synced to a user-specified workspace repository on GitHub (e.g., [username]/workspace):

workspace/
├── fect/
│   ├── CHANGELOG.md                # timeline index
│   ├── HANDOFF.md                  # active handoff
│   ├── ref/                        # reference docs for future work
│   │   └── cv-comparison-table.md
│   └── runs/                       # individual workflow logs
│       ├── 2026-03-16-cv-unification.md
│       └── 2026-03-17-convergence-conditioning.md
├── panelview/
│   ├── CHANGELOG.md
│   ├── HANDOFF.md
│   ├── ref/
│   └── runs/
│       └── 2026-03-17-add-feature.md
└── README.md

This keeps target repos clean (code + essential docs only) while preserving full traceability in one place.

Shared Brain

StatsClaw has a shared knowledge system where techniques discovered during workflows — mathematical methods, coding patterns, validation strategies, simulation designs — are extracted, privacy-scrubbed, and contributed to a collective knowledge base. When you enable Brain mode, your agents get smarter by reading knowledge contributed by all users.

How it works:

Read — Your agents automatically access relevant knowledge entries from statsclaw/brain
Contribute — After noteworthy workflows, the distiller agent extracts reusable knowledge. You review everything and approve or decline — nothing is shared without your explicit consent. You can also run the built-in /contribute command at any time to summarize what you learned — what worked, what required manual intervention, and what domain-specific patterns emerged — and submit it as a structured report
Earn badges — Accepted contributions earn virtual badges on the Contributors leaderboard

Privacy guarantee: All contributions are automatically scrubbed of repo names, file paths, usernames, proprietary code, and any identifying information. Only generic, reusable knowledge is shared.

Repo	Purpose
`statsclaw/brain`	Curated knowledge — agents read from here
`statsclaw/brain-seedbank`	Contribution staging — users submit PRs here

Brain mode is optional — you choose at session start. See Brain System Documentation for full details.

Design Principles

Credentials first, work second. Verify push access before creating a run.
Team Leader dispatches, never does. Leader plans and coordinates; teammates do the work.
Multi-pipeline, fully isolated. Code, test, and simulation pipelines never see each other's specs.
Planner first, always. Every non-trivial request starts with dual-spec production.
Adversarial verification by design. Independent convergence proves correctness.
Hard gates, not soft advice. State transitions have preconditions; artifacts are verified.
Worktree isolation for writers. Builder, simulator, and scriber run in isolated git worktrees.
Surgical scope. Each run modifies only what the request requires.
Explicit ship actions. Nothing is pushed without user instruction or active patrol skill.
Collective knowledge, individual consent. Brain mode lets agents learn from all users, but nothing is shared without explicit per-workflow approval.

Citation

If you use StatsClaw in your research or software development, please cite our paper:

Qin, Tianzhu and Yiqing Xu. 2026. "StatsClaw: An AI-Collaborative Workflow for Statistical Software Development."

BibTeX:

@misc{qinxu2026statsclaw,
  title={StatsClaw: An AI-Collaborative Workflow for Statistical Software Development},
  author={Qin, Tianzhu and Xu, Yiqing},
  year={2026},
  howpublished = {Mimeo, Stanford University},
  url={https://bit.ly/statsclaw}
}

License

StatsClaw is released under the MIT License.

Get Involved

We are building StatsClaw in the open. Everyone is welcome.

Share an idea — Discussions
Report a bug — Bug report
Contribute code — Contributing guide
Contribute knowledge — Enable Brain mode and your discoveries help everyone. Learn more
See what is planned — Roadmap

statsclaw.ai

A tool for statisticians and econometricians. Works best with an expert in the loop.