Agentic Harness for Large-Scale Code Generation

AI agent harness for automated code generation and complexity estimation.
Multiple deployable codebases built by multiple SOTA AI models.
When their code complexity scores align, it proves the specs are complete.

SpecFlow is an AI agent harness that automates code generation, deployment, and testing through parallel AI agents in isolated, sandboxed execution environments.

Validator agents continuously assess, resume, and refine work until delivery standards are met.

https://github.com/user-attachments/assets/0783c394-1bf0-449f-a36e-1d46e899661c

Getting Started

Software

Requirement	Notes
Docker	Container runtime for the harness sandbox. Install Docker
`uv`	Python package manager. Install via `brew install uv` or see docs
IDE	SpecFlow is used as MCP in a IDE with agentic AI enabled: Cursor, Claude Code, Copilot, Gemini etc. This is the users project.

Keys and Tokens

Key	Name in `.env`	Notes
GitHub Personal Access Token	`GITHUB_TOKEN`	For disposable workspace repos. Scope: `repo` + `read:user` + `workflow` `repo,read:user`. Advan
P10Y API key	`P10Y_API_KEY`	Code complexity scoring. Setup guide
LLM provider key	`OPENROUTER_API_KEY` or `ANTHROPIC_API_KEY`	One key required. Get OpenRouter key (default) or Get Anthropic key

Installation

Few simple steps to get you going:

clone repo

git clone https://github.com/griddynamics/specflow.git && cd specflow

install Specflow (includes the Terminal UI that guides you through onboarding)
```
uv tool install --editable ./mcp_server
```
start Specflow app and follow instructions
```
specflow tui
```

[!Important]
Specflow Harness Sandbox is now running locally.
Easiest: in specflow tui, press c (Add MCP to AI tool) — the setup screen detects
Claude Code, Gemini CLI, and Cursor and wires SpecFlow up for you (one key), with an honest
connected/added/failed status per client.
Prefer to do it by hand? Copy-paste the content of .specflow-local/mcp-config.json into your client.

Cursor

Claude Code

Claude Desktop

Copilot

Gemini CLI

...and any other IDE or client that supports the Model Context Protocol.

Usage

MCP is now ready to use in any project. Prompt your IDE agent to talk to the harness.

Let's say specification files are in specs directory, you can follow these steps:

Start a new project in IDE and put your specs files into specs directory

specs/
|-- product-requirements.md
|-- user-flows.pdf
\-- acceptance-criteria.md

Check your specification completeness using check_specification_completeness tool
```
Use SpecFlow MCP to check specification completeness in specs directory
```
Create a detailed plan using our run_planning tool
```
Create implementation plan using SpecFlow MCP
```
When you are happy with the plan, run generation using run_generation as above
```
Run generation with SpecFlow MCP
```
Generation usually takes many hours, use our TUI to monitor progress and receive Desktop Notifications:
```
# Any terminal
specflow tui
```

When the generation has been completed, you can retrieve the results and P10Y reports from harness:
```
Download outputs using Specflow MCP
```
The rule of thumb is: if the P10Y score spread is low, then your specification is ready!
Use the built-in prompt to compare the variants and identify their strong and weak sides, together with a plan to automatically assemble the best variant.
```
use SpecFlow MCP prompt: specflow-compare-variants
```

MCP Tools

Tool	Description
`check_specification_completeness`	Analyze specs for gaps and contradictions (local)
`run_planning`	Generate a phased implementation plan (local)
`read_document`	Extract PDF/DOCX/PPTX/XLSX/CSV to markdown (local)
`run_generation`	Upload and launch parallel codegen on the backend (2-8 hrs)
`check_status`	Poll generation progress
`download_outputs`	Download archived artifacts from a completed run
`retry_generation`	Retry a failed generation

If you want to go deeper

SpecFlow Detailed Overview

https://github.com/user-attachments/assets/ea1dd95d-5742-4c51-bf2c-c2cb582669c3

Full MCP config and usage: MCP_USER.md

Full MCP API reference: docs/mcp/API_REFERENCE.md

Detailed SpecFlow harness instructions: QUICKSTART.md

[!Important]
AI agents work in scratchpad repos that are reset before each run — we create them for you.
**Do not point SpecFlow at repositories with code or history you want to keep.
** The managed SpecFlow service is for Grid Dynamics employees only. Open-source users should run the local quickstart.

Documentation

Document	Description
QUICKSTART.md	Local setup and first run
CONTRIBUTING.md	How to contribute — workflow and PR checklist
CLAUDE.md	Development protocol and STEEL commandments
docs/ARCHITECTURE.md	System design and data flow
docs/mcp/API_REFERENCE.md	MCP tool reference
docs/backend/DEVELOPMENT.md	Backend development guide
docs/backend/API_REFERENCE.md	REST API reference
docs/operations/TROUBLESHOOTING.md	Troubleshooting guide
docs/IDE-SETUP.md	IDE configuration (Cursor + Claude Code)

License

(back to top)