Best of Agent Harnesses and Harness Techniques

🏆 Curated list of AI agent harnesses, orchestration frameworks, and harness techniques for reliable agentic systems.

What is an agent harness?

An agent harness is the runtime that closes the loop between a stateless model and the outside world—managing perception, action, memory, and constraint enforcement—making it the de facto operating system of machine agency and, consequently, the layer where nearly all meaningful questions about AI autonomy, reliability, and control are actually resolved.

Every prior wave of automation was constrained by brittleness: you scripted exact behavior, and when the world deviated, the system broke. Foundation models inverted that problem—they're flexible but directionless, stateless, and disconnected from anything real. The agent harness exists to bridge that gap: it is the orchestration infrastructure that converts a model's per-turn reasoning into sustained, tool-using, error-recovering, goal-directed behavior across time. Architecturally, it plays the role the kernel played in operating systems or the controller played in industrial robotics—mediating between raw capability and a messy environment—but with a critical difference: the "capability" it governs is general-purpose cognition, which means the harness is simultaneously a scheduler, a permission system, a memory manager, and a policy enforcement layer, all under-specified and evolving in real time. The term itself barely exists in formal literature yet, which should concern anyone who cares about AI governance, because the harness is where abstract alignment goals either get operationalized into concrete constraints or quietly don't.

Why harnesses matter

Better models make harnesses more important: more capabilities mean more failure modes, and production needs retry logic, fallbacks, and validation. Harness quality—not just model quality—determines whether agents actually ship. This list ranks projects by relevance to harness concerns (environment, orchestration, lifecycle, guardrails) and by stars/activity.

Progressive disclosure harnesses 7 projects
Coding agent products (IDEs, CLIs, full suites) 9 projects
Coding harness configs and SDKs 10 projects
Frameworks 24 projects
Multi-agent and orchestration 5 projects
Plugins, MCPs, CLI tools 12 projects
Evaluation and benchmarking harnesses 17 projects
Research and task-specific harnesses 3 projects
Libraries and SDKs 15 projects

Explanation

Simplicity ↔ capability: Where each project sits on the axis from minimal/simple (lean API, format only, thin layer) to high capability (full platform, many features, kitchen-sink).
OSS: ✅ = standard open-source license (MIT/Apache/BSD/GPL/MPL/AGPL/CC0). ⚠️ = source-available or restricted (e.g. n8n Fair-code, Elastic-2.0, Polyform). ❓ = no license file or unclear terms.
🥇🥈🥉 Combined project-quality score
⭐️ Star count from GitHub
🐣 New project (less than 6 months old)
💤 Inactive project (6 months no activity)
💀 Dead project (12 months no activity)
📈📉 Project is trending up or down
👨‍💻 Contributors count from GitHub
🔀 Fork count from GitHub
📋 Issue count from GitHub
⏱️ Last update timestamp on package manager

Progressive disclosure harnesses

Formats, runtimes, and patterns that reveal context, tools, or instructions in layers—index first, details on demand—to control tokens and improve agent focus (the "map, not encyclopedia" principle).

#	Project	Description	OSS	Simplicity ↔ capability
1	agents.md	Open format for repo-scoped agent briefings; v1.1 adds hierarchical scope and progressive disclosure so agents get a map of what exists, then load only what's relevant.	✅	Simple (format only)
2	awesome-cursorrules	Curated .cursorrules and skills that leverage Cursor's index-then-load model; the canonical collection for rules-as-progressive-disclosure in the IDE.	✅	Simple (content bundle)
3	MCP-Zero	Active tool discovery for autonomous agents: model requests tools by requirement; hierarchical semantic routing over 308 servers / 2,797 tools with ~98% token reduction (APIBank).	✅	Capability (3k tools, full routing)
4	langgraph-bigtool	Build LangGraph agents with large tool sets; retrieval and on-demand tool loading so agents scale beyond context without stuffing every schema upfront.	✅	Capability (large tool sets)
5	spring-ai-tool-search-tool	Dynamic tool discovery for Spring AI: model gets a search tool first, then pulls definitions for relevant tools; 34–64% token reduction across providers.	✅	Mid (search-then-load)
6	ToolGen	ICLR 2025: unified tool retrieval and calling via generation; 47k+ tools without context stuffing—retrieval and invocation in one generative step.	❓	Capability (47k+ tools)
7	ToolRAG	Semantic tool retrieval for LLMs; serves only the tools the user query demands (MCP-compatible), unlimited tool sets with zero context penalty.	✅	Mid (query-driven retrieval)

Coding agent products (IDEs, CLIs, full suites)

Turnkey coding agents you install and run: IDE extensions, terminal CLIs, Dockerized workspaces. Each entry notes which part is the harness (the agent loop, tool wiring, approval model) versus the UI shell (VS Code extension, TUI, browser client).

#	Project	Description	OSS	Simplicity ↔ capability
1	Cline	VS Code extension whose harness is a plan-then-act loop with per-step human approval and cost transparency; the VS Code integration is the UI shell. Open-source counterweight to Cursor.	✅	Mid (plan-then-act, approval gates)
2	Roo Code	VS Code/Cursor extension in the Cline lineage. The harness is the approval-gated agent with custom modes and a strong MCP story; the IDE is the UI. Popular community fork when you want that workflow without the upstream extension.	✅	Mid (IDE extension, MCP-first)
3	Codex	OpenAI's terminal coding agent. The harness is the sandboxed tool-call loop with multi-provider support; the CLI is the shell. Reference implementation for "official CLI that ships code."	✅	Mid (reference CLI, sandboxed)
4	Gemini CLI	Google's first-party terminal agent for Gemini. The harness is the plugin/MCP tool-call loop; the terminal is the shell—Google's parallel to Claude Code / Codex, not just an API.	✅	Mid (official CLI, plugins, MCP)
5	crush	Charm's terminal coding agent (successor to OpenCode). The harness is the tool-calling loop with session persistence; the Bubble Tea TUI is the shell.	⚠️ FSL-1.1-MIT	Mid (terminal agent, TUI)
6	OpenHands	Dockerized software-engineering agent. The harness is the bash/editor/browser toolset with micro-agents and event-stream session bridging; Docker is the sandbox. Main OSS choice for teams self-hosting autonomous repo work.	⚠️ (multi-license)	Capability (Docker runtime, multi-surface agent)
7	goose	Block's extensible Rust agent. The harness is the MCP/ACP extension model with recipes and provider choice; there's no fixed UI slot—you bolt it into whatever shell you use.	✅	Mid (extensions, MCP/ACP)
8	claw-code-agent	Python reimplementation of the Claude Code agent architecture with zero external dependencies; interactive chat, streaming, plugin runtime, nested agent delegation, cost tracking, MCP transport—portable harness without the Rust/TS toolchain.	❓	Capability (pure Python, plugin runtime)
9	coderClaw	Self-hosted multi-role coding system (Creator, Reviewer, Test, Refactor, etc.) with AST and semantic maps; IDE-agnostic, chat-channel triggers.	❓	Capability (multi-role, AST/semantic)

Coding harness configs and SDKs

Skill packs, slash-command libraries, meta-prompting frameworks, and official SDKs that give you the harness (the agent loop, planning, memory, hooks) without bundling a specific IDE or CLI shell.

#	Project	Description	OSS	Simplicity ↔ capability
1	get-shit-done	Goal-backward planning and wave-based execution over fresh context windows; avoids context rot by design. Python/JS meta-prompting for Claude Code, OpenCode, Gemini CLI.	✅	Mid (meta-prompting, you own stack)
2	GStack	Garry Tan's Claude Code skill stack: 23 slash-command modes (CEO/eng/design review, QA, ship, browse, retro, …) that structure one assistant as a virtual engineering team. Daily driver while running YC.	✅	Capability (multi-role slash-command harness)
3	everything-claude-code	The breakout 2026 harness pack for Claude Code (approaching 160k stars): 28 specialized subagents, 119 reusable skills, 60 slash commands, 34 rules, 20+ automated hooks. Ships a full "AI engineering team" as config.	✅	Capability (subagents + skills + hooks)
4	superpowers	Performance-oriented harness pack for Claude Code, Codex, OpenCode, Cursor: skills, instincts, memory, security, research-first workflows. Treats harness engineering itself as the performance lever.	✅	Capability (multi-IDE skill stack)
5	pmstack	Claude Code config for AI product managers: CLAUDE.md plus skills for competitive analysis, PRD-from-signal, metric frameworks, stakeholder briefs, and agent eval design. "GStack for PMs."	✅	Simple (skills bundle, PM-focused)
6	Claude Agent SDK	Official Anthropic SDK (Python + TypeScript, demos, quickstarts): built-in tools, MCP, long-running coding agents with session bridging.	✅	Capability (full SDK, session bridging)
7	AutoHarness	Lightweight governance harness: wraps any LLM client in ~2 lines for automated harness engineering—6–14 step pipeline, YAML constitution, risk-pattern matching, session persistence with cost tracking, multi-agent profiles.	✅	Simple (2-line wrapper, YAML gov)
8	RepoMaster	Repo-scoped research harness: builds function-call and module-dependency graphs to explore only what's needed; large relative gains on MLE-bench and GitTaskBench with lower token use.	❓	Capability (graph-based exploration)
9	SWE-agent	LM-driven harness built for SWE-bench: edit state, command execution, and issue-focused loop—the reference agent stack next to the benchmark itself.	✅	Capability (SWE-bench pairing, stateful edits)
10	OpenHarness (HKUDS)	Open agent harness with a built-in personal agent ("Ohmo") that runs across Feishu, Slack, Telegram, and Discord; core tool-use, skills, memory, multi-agent coordination with auto-compaction for multi-day sessions.	✅	Capability (personal agent + multi-channel)

Frameworks

General-purpose agent and LLM application frameworks (the app layer, not harnesses per se).

#	Project	Description	OSS	Simplicity ↔ capability
1	langgraph	State-machine graphs over LLM steps; checkpointing, human-in-the-loop, and durable execution so workflows survive restarts.	✅	Capability (graphs, checkpointing, durable exec)
2	langchain	Chains, tools, retrievers, and agents; the usual entry point for "add tools to an LLM" in Python/JS.	✅	Capability (kitchen-sink ecosystem)
3	llama-index	Data-centric: indexing, RAG, and query engines; agent abstractions sit on top of your data pipelines.	✅	Capability (RAG + agents)
4	semantic-kernel	Microsoft's plugin and planner layer for LLMs; C#, Python, Java; strong on enterprise auth and orchestration.	✅	Capability (enterprise, multi-language)
5	mastra	TypeScript-first; agents, tools, and workflows with a single runtime and minimal boilerplate.	⚠️ Elastic-2.0	Mid (TS-first, minimal boilerplate)
6	agno	Python agents with memory, knowledge bases, tools, and structured outputs; continues the PhiData-era product line under the Agno name—production apps, evals, and pipelines.	✅	Capability (memory, KB, observability)
7	letta	Python agent runtime with tool use and control flow; lean API; stateful agents with long-horizon memory.	✅	Simple (lean API)
8	langflow	Low-code UI to build and deploy LangChain/LangGraph flows; visual DAG editor and one-click run.	✅	Capability (low-code, visual)
9	rasa	Conversational AI stack (NLU, dialogue, actions); long-standing OSS choice for chat and voice bots.	✅	Capability (full stack)
10	botpress	Visual bot builder and runtime; multi-channel, open-source alternative to commercial bot platforms.	✅	Capability (visual builder, multi-channel)
11	Dify	One-stop LLM app platform: visual workflows, RAG pipeline, 50+ tools, model management; "ship from prototype to prod" in a single UI.	⚠️ Fair-code	Capability (one-stop platform)
12	n8n	Fair-code workflow engine with 400+ nodes and native AI nodes; the self-hosted Zapier that actually does agents and LangChain.	⚠️ Fair-code	Capability (400+ nodes, workflow engine)
13	AutoGPT	The original autonomous loop: goal in, agent iterates with tools and memory; Forge is the dev framework, Benchmark the eval harness.	⚠️ Polyform-SU	Capability (autonomous loop, tools, memory)
14	AIlice	Fully autonomous general-purpose agent; one binary, Docker-ready, for when you want "set goal and walk away" without a framework.	✅	Capability (autonomous, one binary)
15	Bee Agent Framework	Python + TypeScript, LF AI–backed; MCP/ACP, workflows, Requirement Agent; the one that pushes "production multi-agent" without LangChain.	✅	Capability (production multi-agent)
16	agent-squad	AWS-originated orchestrator (now under 2FastLabs): intent classification, streaming, SupervisorAgent; "agent-as-tools" so one agent delegates to a squad.	✅	Capability (squad orchestration)
17	SuperAgentX	Lightweight multi-agent orchestrator with an AGI-angle; minimal surface, docs-first, for teams that want orchestration without the kitchen sink.	✅	Simple (minimal surface)
18	AgentVerse	Task-solving and simulation envs for multi-LLM agents; deploy many agents in custom environments without building infra from scratch.	✅	Capability (simulation envs, multi-agent)
19	R2R	RAG-first: hybrid search, knowledge graphs, multimodal; the framework for "production RAG" when you care more about retrieval than chat UI.	✅	Capability (production RAG)
20	LiteSwarm	Async-only, LiteLLM-backed Python; dynamic agent switching and type-safe context—for devs who want 100+ models without LangGraph's weight.	✅	Mid (100+ models, dynamic switching)
21	AgentStack	Scaffolds full agent projects; plugs in CrewAI, LangGraph, OpenAI Swarm, LlamaStack and wires AgentOps observability from day one.	✅	Capability (scaffold, multi-backend)
22	AgentSilex	~300 lines of readable agent code on top of LiteLLM; the "I want to see the whole loop" option for learning or minimal production.	✅	Simple (~300 LOC)
23	Flowise	Drag-and-drop LangChain UI; deploy flows without code. The low-code sibling to Langflow, with a different component and hosting story.	⚠️ Apache+CLA	Capability (low-code, drag-drop)
24	browser-use	Python layer over Playwright: natural-language goals become browser actions—web-agent loop without hand-rolling MCP or a custom driver for every site.	✅	Mid (LLM + browser, Playwright)

Multi-agent and orchestration

Harnesses and patterns for multi-agent coordination and handoffs.

#	Project	Description	OSS	Simplicity ↔ capability
1	openai-agents-python	Handoffs, guardrails, and multi-LLM routing; minimal surface so you own the loop.	✅	Simple (minimal surface)
2	crewAI	Role-based agents (roles, goals, backstories) in Crews; Flows add event-driven and hierarchical control for production.	✅	Capability (roles, Flows, production)
3	autogen	Conversable agents and group chats; code execution and human-in-the-loop; Microsoft origin, AG2 ecosystem.	✅ CC-BY	Capability (group chat, code exec, AG2)
4	PraisonAI	Autonomous multi-agent teams with a single entry point; emphasis on minimal config.	✅	Mid (single entry, minimal config)
5	AgentRL	Multitask, multiturn RL for LLM agents; Ray-based scaling, rollout/actor workers—for teams that want to train agents, not just run them.	✅	Capability (RL, Ray, train agents)

Plugins, MCPs, CLI tools

IDE plugins, concrete MCP servers, and CLI tools that give agents tools and context.

#	Project	Description	OSS	Simplicity ↔ capability
1	aider	Git-aware CLI pair programmer; edits in-repo, supports multiple models and MCP so agents see version control and tools.	✅	Mid (CLI, git-aware, MCP)
2	agentlog	Persistent decision memory for any project: `remember`, `recall`, `reflect`. Single-file Python CLI that stores decisions as JSONL and uses Claude or Gemini to retrieve and synthesize patterns—Karpathy's LLM Wiki concept as a CLI.	✅	Simple (one file, three commands)
3	claude-mem	Claude Code plugin that captures everything an agent does during a session, AI-compresses it (via claude-agent-sdk), and injects the relevant context into future sessions—session-to-session memory as a drop-in.	✅	Capability (session capture + compression)
4	Better-OpenCodeMCP	MCP server for OpenCode/Crush: async task execution, model bridging (e.g. Claude→Gemini), process pooling.	✅	Mid (MCP server, model bridging)
5	MCP Python SDK	Official SDK to build and consume MCP servers/clients in Python; stdio and SSE transports.	✅	Simple (SDK only)
6	MCP TypeScript SDK	Official MCP implementation for Node/TS; reference for the protocol.	✅	Simple (protocol reference)
7	continue	Open-source IDE extension (VS Code, JetBrains); in-editor completion and chat with local or API models.	✅	Capability (IDE extension, multi-editor)
8	MCP Inspector	GUI to test and debug MCP servers; inspect tools, resources, and prompts.	✅	Simple (debug GUI)
9	github-mcp-server	MCP server for GitHub: repos, issues, PRs, code search; so your agent can "use GitHub" without hand-rolled API glue.	✅	Mid (GitHub API surface)
10	Docker MCP Gateway	Docker's official MCP CLI plugin / gateway; container-aware MCP tooling from Docker (replaces deprecated `docker/mcp-servers` path).	✅	Mid (Docker-aware MCPs)
11	puppeteer-mcp-server	Browser automation via MCP: tabs, screenshots, forms, JS execution; the one that connects to existing Chrome for dev/debug.	✅	Mid (browser automation)
12	puppeteer-real-browser-mcp	Puppeteer MCP with real-browser and anti-detection; for agents that need to drive sites that block headless.	❓	Mid (real browser, anti-detect)

Evaluation and benchmarking harnesses

Agentic eval systems, reasoning benchmarks, and open agent benchmarks.

#	Project	Description	OSS	Simplicity ↔ capability
1	ARC-AGI-2	ARC Prize task set: grid-based abstraction/reasoning; public and private splits for generalization.	✅	Simple (task set)
2	arc-agi-benchmarking	Runner for ARC-AGI: multi-provider (OpenAI, Anthropic, Gemini, etc.), rate limits, retries, and scoring.	✅	Mid (runner, multi-provider)
3	AgencyBench	Long-horizon agent benchmark: 32 scenarios, 138 tasks, ~1M tokens and ~90 tool calls; Docker sandbox and rubric-based + LLM judges.	✅	Capability (32 scenarios, Docker, judges)
4	TRAIL	Trace reasoning and agentic issue localization; 148 long-context traces, 841 errors, 20+ error types; Hugging Face dataset.	✅	Mid (traces, Hugging Face)
5	AgentBench	ICLR'24 benchmark: agents across AlfWorld, DB, knowledge graphs, OS, webshop; Docker Compose, function-calling interface.	✅	Capability (multi-env, Docker Compose)
6	WebArena	Realistic web env (e.g. e‑commerce, CMS, dev tools); 812 tasks; measures end-to-end web agent success.	✅	Capability (812 tasks, web env)
7	SWE-bench	LMs resolve real GitHub issues; Docker harness, instance IDs; standard for code-agent evals.	✅	Capability (real GitHub issues, standard)
8	SWE-Gym	Training and evaluation for SWE agents and verifiers (ICML 2025).	✅	Capability (training + eval, ICML)
9	swe-smith	Data generation for SWE agents; 50k+ instances across 128 repos; used for SWE-agent-LM training.	✅	Capability (50k+ instances, data gen)
10	SUPER	Agents that set up and run ML/NLP from GitHub repos; 45 expert problems, 152 masked tasks, 602 AutoGen tasks; Docker-based.	✅	Capability (ML/NLP repos, Docker)
11	VitaBench	ICLR'26: 66 tools, real-world apps (delivery, travel, retail); 100 cross-scenario + 300 single-scenario tasks; adopted by Qwen/Seed.	✅	Capability (66 tools, cross-scenario)
12	letta-evals	Eval harness for stateful Letta agents; configurable suites and grading (LLM or rule-based) so you can measure what you ship.	✅	Mid (Letta-specific harness)
13	gaia-agent	Modular runner for the GAIA benchmark (450 real-world assistant questions); multi-agent evaluation without the Inspect AI lock-in.	✅	Mid (GAIA runner, modular)
14	WebVoyager	End-to-end web agent with LMMs: screenshots + actions on real sites; benchmark on 15 sites, GPT-4V for automatic eval.	✅	Capability (LMMs, screenshots, 15 sites)
15	inspect_evals	UK AISI/Arcadia/Vector: GAIA and other evals in Inspect AI; level 1–3, sandboxed, tool-calling solvers.	✅	Mid (Inspect AI, UK gov)
16	inspect_ai	Inspect AI core: composable eval tasks, sandboxes, scorers, and multi-model runs; the framework behind inspect_evals, not just the task bundle.	✅	Capability (eval framework, AISI stack)
17	Agent Lightning	Microsoft's training-oriented harness: optimization loops for agent behavior—when you need to improve policies over rollouts, not only score a fixed prompt.	✅	Capability (agent training, Microsoft stack)

Research and task-specific harnesses

Deep research, document QA, and domain-specific agent loops.

#	Project	Description	OSS	Simplicity ↔ capability
1	openagents	Platform for autonomous agents and autopilot-style workflows; decentralized/Nostr-oriented.	✅	Capability (platform, decentralized)
2	multi-scale-agentic-rag-playbook	NVIDIA's playbook: RAG at different scales with LangGraph agents, abstract search, and query routing—reference architecture, not a product.	✅	Mid (playbook, reference arch)
3	Agentic_RAG_System	Ollama + LangChain, FAISS/BM25/RRF retrieval and an agentic reasoning loop; one concrete stack for "RAG that corrects itself."	❓	Mid (Ollama + LangChain stack)

Libraries and SDKs

Lightweight runtimes, tool loops, and provider-agnostic harness primitives.

#	Project	Description	OSS	Simplicity ↔ capability
1	pydantic-ai	Type-safe Python agents with Pydantic I/O; multi-provider, MCP, Logfire observability, and human-in-the-loop.	✅	Capability (type-safe, MCP, Logfire)
2	open-harness	TypeScript Agent class on Vercel AI SDK; streaming events, filesystem/bash tools, MCP, and subagent delegation.	✅	Capability (streaming, tools, subagents)
3	vercel/ai	React and Node SDK for streaming, tool calls, and agent-style UIs; provider-agnostic.	✅	Mid (React/Node SDK, provider-agnostic)
4	agent-harness	Thin Python shim to swap OpenAI vs Anthropic agent SDKs behind one interface.	✅	Simple (thin shim)
5	smolagents	Code-as-action agents: model outputs Python executed in sandbox (E2B, Modal, etc.); ~1k LOC core.	✅	Mid (code-as-action, ~1k LOC)
6	Community-curated agent lists	Broader directories: e.g. brandonhimpfen/awesome-ai-agents, axioma-ai-labs/awesome-ai-agent-frameworks, mb-mal/awesome-ai-agents-frameworks—differ by scope and update cadence.	❓	Simple (curated lists)
7	agentic	TypeScript agent stdlib: works with any LLM and the TS AI SDK; few abstractions, so you own the loop and the UI. (archived Feb 2026.)	✅	Simple (stdlib, you own loop)
8	strands-agents	Model-driven Python SDK; decorators for tools, native MCP, multi-agent; "minimal code" without sacrificing provider choice.	✅	Mid (decorators, MCP, minimal code)
9	LiteLLM	One interface to 100+ LLMs; routing, caching, budgets. Not an agent framework—the pipe every agent framework uses.	✅	Simple (LLM pipe only)
10	litellm2	LiteLLM plus structured Pydantic outputs, budget controls, and agent-style tool loops; OpenRouter-default option.	✅	Mid (LiteLLM + tool loops)
11	openai-agents-js	Official OpenAI Agents SDK for Node/TS: handoffs, guardrails, voice; the JS counterpart to openai-agents-python.	✅	Capability (handoffs, guardrails, voice)
12	agent-framework	LiteLLM-backed Python with dynamic tool registry, query routing, memory, and Streamlit UI; "full-stack agent app" in one repo.	✅	Capability (tool registry, routing, Streamlit)
13	agentic-ai	Agentic AI stdlib for TypeScript; any LLM, any TS AI SDK; another "thin layer so you own the rest" option.	✅	Simple (thin layer)
14	E2B	Firecracker sandboxes for executing agent-generated code; the hosted isolation layer many tool-calling demos use instead of running arbitrary LLM output on your laptop.	✅	Mid (sandbox API, code execution)
15	Daytona	Elastic dev environments for AI-generated code: workspaces, Git, previews—infra harness between "the model wrote a patch" and "it ran in a real machine."	✅	Mid (dev env API, isolation)

Related Resources

Awesome: Awesome lists on many topics
OpenAI – Harness engineering: Environment design, intent, feedback loops, repo-as-system-of-record
Anthropic – Effective harnesses for long-running agents: Session bridging, feature lists, incremental progress, testing
Aakash Gupta (Medium) – 2026 is agent harnesses: Harness as moat, minimal intervention, progressive disclosure
LangChain, Anthropic, OpenAI: Official docs for major agent platforms

Contribution

Contributions are welcome. To add or suggest projects:

Open an issue with the repo URL, category, and a short description.
Or submit a pull request editing projects.yaml (and optionally README.md).

For contribution guidelines, see CONTRIBUTING.md and the Code of Conduct.

best-of-Agent-Harnesses