Build A Harness

Build complete AI agent harnesses on canvas. Compile to any orchestrator. Observe with Langfuse.

A workflow routes prompts from node to node. A harness governs what the agent believes, what it is allowed to do, how it catches its own mistakes, and what it learns. Build A Harness delivers the complete 11-layer architecture — draw it on a canvas, compile to any framework, trace every decision.

Canvas  →  flow.json  →  LangGraph · CrewAI · Mastra · MS Agent Framework  →  Langfuse

The spec is the contract. The canvas is the editor. The adapters are the compilers.

Why a harness, not just a workflow

Simple Agent Loop	Full Harness — Implemented
Input / Caller	Caller State — constraints · clarification
↓	World Model — beliefs · contradictions · generation_id
LLM Call	Reasoning — evidence · hypotheses (4 sources) · VOI
↓	Control ← key — 5-tier resolver · NORMAL / CAUTIOUS / BLOCKED
Tool Call ↺ loop	Planning — task graph (6-state) · parallel concurrency
↓	Execution + Verification — VOI gate · 9 layers
Output	Recovery + Memory — 6 strategies · compression
	Learning — experience store · warm start (optional)
	Output & Reviewer Pass — contract · 3-lens review
prompt in → answer out	22 nodes · 11 layers · 379 tests passing

What's implemented

Canvas & execution layer

✅ Canvas with 27 node types (14 execution + 13 harness)
✅ 4 framework adapters — LangGraph, CrewAI, Mastra, MAF
✅ Langfuse observability — harness traces across all runtimes
✅ HITL pause/resume · REST / MCP / A2A deploy
✅ FlowSpec v0.2.0 — open, portable JSON format
✅ Process concepts — pre-seeded task graph scaffolds

Reasoning & control layer

✅ World model · typed beliefs · contradiction detection
✅ 5-tier control state resolver · deadlock detection
✅ Pre-execution review gate · 9-layer verification
✅ 6 named recovery strategies · typed failure library
✅ Experience store — cross-run structural reuse
✅ Adversarial reviewer pass · output contract validation

Node palette

Harnesses are built from 14 core nodes and 13 harness-layer nodes — every node compiles to all four runtimes. Hover a node name for its description.

Core nodes
⤵ `input`	⤴ `output`	✨ `llm_call`	🔧 `tool_invoke`
⎇ `condition`	⑂ `parallel_fork`	⊖ `parallel_join`	⏸ `hitl_breakpoint`
📖 `memory_read`	🔖 `memory_write`	📦 `subgraph`	⇌ `transform`
🤖 `agent_role`	👥 `agent_debate`

Harness nodes — implement the 11-layer control architecture
🧠 `world_model`	💡 `hypothesis_set`	🗄️ `gather_evidence`	⚙️ `apply_tool_rel`
🔄 `update_wm`	🛡️ `control_state`	🕸️ `task_graph`	✅ `verify_gate`
♻️ `recovery`	📋 `evidence_store`	📊 `exp_store`	👁️ `reviewer_pass`
🧭 `process_concept`

Full architecture, pseudo-code, and state model: plan/harness_architecture.html

Frameworks

All four runtimes compile from the same flow.json — no rewriting.

Runtime	Language	HITL	Key integration
LangGraph	Python	`interrupt()`	`@observe` · harness child spans
CrewAI	Python	—	`context_from → Task.context` · tier-aware memory
Mastra	TypeScript	`suspend()/resume()`	Node.js sidecar
MS Agent Framework	Python	`_HitlPause`	`AgentGroupChat` native · OTel → Langfuse

Compile: POST /compile?runtime=langgraph — same spec, any runtime.
Deploy as a REST endpoint, MCP tool, or A2A agent in one step.

Observability

Self-hosted Langfuse starts with docker compose up — no extra configuration needed.

Per-node child spans across all four runtimes (world model, control state, verification, recovery)
Token counts, latency, and cost per node via LiteLLM
Live View trace → link in the canvas after each run
Managed prompts via Langfuse prompt API (prompt_ref on any llm_call node)

Quick start

./scripts/setup-env.sh   # generate secrets, write .env
docker compose up        # start all 9 services

Service	URL
Canvas	http://localhost:3000
Adapter API	http://localhost:8000/health
Langfuse	http://localhost:3001

Without Docker

./scripts/setup-env.sh && source adapter/.venv/bin/activate
npm install && npm run dev        # canvas → localhost:3000
cd adapter && python main.py      # adapter → localhost:8000

Running tests

npm test                                         # Vitest — validates 5 reference flows
pytest adapter/tests/ -v                         # adapter unit + integration
pytest adapter/tests/test_maf_adapter.py -v     # MAF suite (742 tests)

New here? Start with docs/getting-started.md · Startup errors? docs/troubleshooting.md · Real-time collaboration: docs/collab.md · On-prem / Kubernetes: docs/deployment.md

LLM providers

All calls route through LiteLLM — add the key to .env.

Provider	Env var	Example models
OpenAI	`OPENAI_API_KEY`	`gpt-4o`, `gpt-4o-mini`
Anthropic	`ANTHROPIC_API_KEY`	`claude-sonnet`, `claude-opus`
Ollama (local)	—	`mistral`, `qwen3`, `qwen2.5-coder`

No API key? Install Ollama, run ollama pull mistral, then ./scripts/setup-ollama.sh — tests all four frameworks with no paid account.

Full setup: docs/llm-setup.md

Embed the canvas

npm install @buildaharness/canvas

import { BuildAHarnessCanvas } from '@buildaharness/canvas'
import '@buildaharness/canvas/styles.css'

<BuildAHarnessCanvas
  initialSpec={mySpec}
  onSpecChange={(updated) => save(updated)}
  execStats={runState.nodeStats}
  theme="dark"
/>

Full props reference: packages/canvas/README.md

Documentation


docs/getting-started.md	Step-by-step: clone → secrets → LLM → first run
docs/flowspec.md	FlowSpec v1.0.0 — all 26 node types, edges, fields
docs/architecture.md	System design, service interactions, data flows
docs/api.md	REST API reference — compile, execute, deploy, HITL resume
docs/llm-setup.md	LLM provider setup — OpenAI, Anthropic, Ollama, custom
docs/qdrant.md	Qdrant vector store — seeding, collections, production
docs/env-vars.md	All environment variables across all services
docs/collab.md	Real-time collaboration — Yjs setup and internals
docs/deployment.md	Docker, Helm, SSO/OIDC
docs/troubleshooting.md	Common startup errors
CONTRIBUTING.md	How to contribute

Apache 2.0 — see LICENSE.