best-of-Agent-Harnesses
Health Uyari
- License — License: CC-BY-SA-4.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 9 GitHub stars
Code Basarisiz
- rm -rf — Recursive force deletion command in .github/workflows/setup-best-of-list.yml
Permissions Gecti
- Permissions — No dangerous permissions requested
This project is a curated, ranked list of over 100 AI agent harnesses, orchestration frameworks, and engineering techniques. It serves as an informational directory rather than a functional software library or executable MCP server.
Security Assessment
Overall Risk: Low. The repository does not request dangerous permissions, access sensitive data, or contain hardcoded secrets. The only flagged issue is an `rm -rf` command located inside a GitHub Actions workflow file (`.github/workflows/setup-best-of-list.yml`). Because this command is strictly used for automated repository maintenance and updating the list, it does not pose a threat to developers who are simply reading or cloning the directory. No malicious execution or network exploits were found.
Quality Assessment
The repository is actively maintained, with the most recent push occurring today. It uses a standard Creative Commons license (CC-BY-SA-4.0), which is appropriate for curated documentation and lists. However, community visibility and trust are currently minimal. The project only has 9 GitHub stars, indicating it is either very new or has not yet been widely adopted by the broader developer community.
Verdict
Safe to use.
🏆 Ranked list of 100+ agent harnesses, orchestration frameworks, and harness engineering techniques. Scored and updated weekly.
Best of Agent Harnesses and Harness Techniques
🏆 Curated list of AI agent harnesses, orchestration frameworks, and harness techniques for reliable agentic systems.
What is an agent harness?
An agent harness is the runtime that closes the loop between a stateless model and the outside world—managing perception, action, memory, and constraint enforcement—making it the de facto operating system of machine agency and, consequently, the layer where nearly all meaningful questions about AI autonomy, reliability, and control are actually resolved.
Every prior wave of automation was constrained by brittleness: you scripted exact behavior, and when the world deviated, the system broke. Foundation models inverted that problem—they're flexible but directionless, stateless, and disconnected from anything real. The agent harness exists to bridge that gap: it is the orchestration infrastructure that converts a model's per-turn reasoning into sustained, tool-using, error-recovering, goal-directed behavior across time. Architecturally, it plays the role the kernel played in operating systems or the controller played in industrial robotics—mediating between raw capability and a messy environment—but with a critical difference: the "capability" it governs is general-purpose cognition, which means the harness is simultaneously a scheduler, a permission system, a memory manager, and a policy enforcement layer, all under-specified and evolving in real time. The term itself barely exists in formal literature yet, which should concern anyone who cares about AI governance, because the harness is where abstract alignment goals either get operationalized into concrete constraints or quietly don't.
Why harnesses matter
Better models make harnesses more important: more capabilities mean more failure modes, and production needs retry logic, fallbacks, and validation. Harness quality—not just model quality—determines whether agents actually ship. This list ranks projects by relevance to harness concerns (environment, orchestration, lifecycle, guardrails) and by stars/activity.
Contents
- Progressive disclosure harnesses 7 projects
- Coding agent products (IDEs, CLIs, full suites) 9 projects
- Coding harness configs and SDKs 10 projects
- Frameworks 24 projects
- Multi-agent and orchestration 5 projects
- Plugins, MCPs, CLI tools 12 projects
- Evaluation and benchmarking harnesses 17 projects
- Research and task-specific harnesses 3 projects
- Libraries and SDKs 15 projects
Explanation
- Simplicity ↔ capability: Where each project sits on the axis from minimal/simple (lean API, format only, thin layer) to high capability (full platform, many features, kitchen-sink).
- OSS: ✅ = standard open-source license (MIT/Apache/BSD/GPL/MPL/AGPL/CC0). ⚠️ = source-available or restricted (e.g. n8n Fair-code, Elastic-2.0, Polyform). ❓ = no license file or unclear terms.
- 🥇🥈🥉 Combined project-quality score
- ⭐️ Star count from GitHub
- 🐣 New project (less than 6 months old)
- 💤 Inactive project (6 months no activity)
- 💀 Dead project (12 months no activity)
- 📈📉 Project is trending up or down
- 👨💻 Contributors count from GitHub
- 🔀 Fork count from GitHub
- 📋 Issue count from GitHub
- ⏱️ Last update timestamp on package manager
Progressive disclosure harnesses
Formats, runtimes, and patterns that reveal context, tools, or instructions in layers—index first, details on demand—to control tokens and improve agent focus (the "map, not encyclopedia" principle).
| # | Project | Description | OSS | Simplicity ↔ capability |
|---|---|---|---|---|
| 1 | agents.md | Open format for repo-scoped agent briefings; v1.1 adds hierarchical scope and progressive disclosure so agents get a map of what exists, then load only what's relevant. | ✅ | Simple (format only) |
| 2 | awesome-cursorrules | Curated .cursorrules and skills that leverage Cursor's index-then-load model; the canonical collection for rules-as-progressive-disclosure in the IDE. | ✅ | Simple (content bundle) |
| 3 | MCP-Zero | Active tool discovery for autonomous agents: model requests tools by requirement; hierarchical semantic routing over 308 servers / 2,797 tools with ~98% token reduction (APIBank). | ✅ | Capability (3k tools, full routing) |
| 4 | langgraph-bigtool | Build LangGraph agents with large tool sets; retrieval and on-demand tool loading so agents scale beyond context without stuffing every schema upfront. | ✅ | Capability (large tool sets) |
| 5 | spring-ai-tool-search-tool | Dynamic tool discovery for Spring AI: model gets a search tool first, then pulls definitions for relevant tools; 34–64% token reduction across providers. | ✅ | Mid (search-then-load) |
| 6 | ToolGen | ICLR 2025: unified tool retrieval and calling via generation; 47k+ tools without context stuffing—retrieval and invocation in one generative step. | ❓ | Capability (47k+ tools) |
| 7 | ToolRAG | Semantic tool retrieval for LLMs; serves only the tools the user query demands (MCP-compatible), unlimited tool sets with zero context penalty. | ✅ | Mid (query-driven retrieval) |
Coding agent products (IDEs, CLIs, full suites)
Turnkey coding agents you install and run: IDE extensions, terminal CLIs, Dockerized workspaces. Each entry notes which part is the harness (the agent loop, tool wiring, approval model) versus the UI shell (VS Code extension, TUI, browser client).
| # | Project | Description | OSS | Simplicity ↔ capability |
|---|---|---|---|---|
| 1 | Cline | VS Code extension whose harness is a plan-then-act loop with per-step human approval and cost transparency; the VS Code integration is the UI shell. Open-source counterweight to Cursor. | ✅ | Mid (plan-then-act, approval gates) |
| 2 | Roo Code | VS Code/Cursor extension in the Cline lineage. The harness is the approval-gated agent with custom modes and a strong MCP story; the IDE is the UI. Popular community fork when you want that workflow without the upstream extension. | ✅ | Mid (IDE extension, MCP-first) |
| 3 | Codex | OpenAI's terminal coding agent. The harness is the sandboxed tool-call loop with multi-provider support; the CLI is the shell. Reference implementation for "official CLI that ships code." | ✅ | Mid (reference CLI, sandboxed) |
| 4 | Gemini CLI | Google's first-party terminal agent for Gemini. The harness is the plugin/MCP tool-call loop; the terminal is the shell—Google's parallel to Claude Code / Codex, not just an API. | ✅ | Mid (official CLI, plugins, MCP) |
| 5 | crush | Charm's terminal coding agent (successor to OpenCode). The harness is the tool-calling loop with session persistence; the Bubble Tea TUI is the shell. | ⚠️ FSL-1.1-MIT | Mid (terminal agent, TUI) |
| 6 | OpenHands | Dockerized software-engineering agent. The harness is the bash/editor/browser toolset with micro-agents and event-stream session bridging; Docker is the sandbox. Main OSS choice for teams self-hosting autonomous repo work. | ⚠️ (multi-license) | Capability (Docker runtime, multi-surface agent) |
| 7 | goose | Block's extensible Rust agent. The harness is the MCP/ACP extension model with recipes and provider choice; there's no fixed UI slot—you bolt it into whatever shell you use. | ✅ | Mid (extensions, MCP/ACP) |
| 8 | claw-code-agent | Python reimplementation of the Claude Code agent architecture with zero external dependencies; interactive chat, streaming, plugin runtime, nested agent delegation, cost tracking, MCP transport—portable harness without the Rust/TS toolchain. | ❓ | Capability (pure Python, plugin runtime) |
| 9 | coderClaw | Self-hosted multi-role coding system (Creator, Reviewer, Test, Refactor, etc.) with AST and semantic maps; IDE-agnostic, chat-channel triggers. | ❓ | Capability (multi-role, AST/semantic) |
Coding harness configs and SDKs
Skill packs, slash-command libraries, meta-prompting frameworks, and official SDKs that give you the harness (the agent loop, planning, memory, hooks) without bundling a specific IDE or CLI shell.
| # | Project | Description | OSS | Simplicity ↔ capability |
|---|---|---|---|---|
| 1 | get-shit-done | Goal-backward planning and wave-based execution over fresh context windows; avoids context rot by design. Python/JS meta-prompting for Claude Code, OpenCode, Gemini CLI. | ✅ | Mid (meta-prompting, you own stack) |
| 2 | GStack | Garry Tan's Claude Code skill stack: 23 slash-command modes (CEO/eng/design review, QA, ship, browse, retro, …) that structure one assistant as a virtual engineering team. Daily driver while running YC. | ✅ | Capability (multi-role slash-command harness) |
| 3 | everything-claude-code | The breakout 2026 harness pack for Claude Code (approaching 160k stars): 28 specialized subagents, 119 reusable skills, 60 slash commands, 34 rules, 20+ automated hooks. Ships a full "AI engineering team" as config. | ✅ | Capability (subagents + skills + hooks) |
| 4 | superpowers | Performance-oriented harness pack for Claude Code, Codex, OpenCode, Cursor: skills, instincts, memory, security, research-first workflows. Treats harness engineering itself as the performance lever. | ✅ | Capability (multi-IDE skill stack) |
| 5 | pmstack | Claude Code config for AI product managers: CLAUDE.md plus skills for competitive analysis, PRD-from-signal, metric frameworks, stakeholder briefs, and agent eval design. "GStack for PMs." | ✅ | Simple (skills bundle, PM-focused) |
| 6 | Claude Agent SDK | Official Anthropic SDK (Python + TypeScript, demos, quickstarts): built-in tools, MCP, long-running coding agents with session bridging. | ✅ | Capability (full SDK, session bridging) |
| 7 | AutoHarness | Lightweight governance harness: wraps any LLM client in ~2 lines for automated harness engineering—6–14 step pipeline, YAML constitution, risk-pattern matching, session persistence with cost tracking, multi-agent profiles. | ✅ | Simple (2-line wrapper, YAML gov) |
| 8 | RepoMaster | Repo-scoped research harness: builds function-call and module-dependency graphs to explore only what's needed; large relative gains on MLE-bench and GitTaskBench with lower token use. | ❓ | Capability (graph-based exploration) |
| 9 | SWE-agent | LM-driven harness built for SWE-bench: edit state, command execution, and issue-focused loop—the reference agent stack next to the benchmark itself. | ✅ | Capability (SWE-bench pairing, stateful edits) |
| 10 | OpenHarness (HKUDS) | Open agent harness with a built-in personal agent ("Ohmo") that runs across Feishu, Slack, Telegram, and Discord; core tool-use, skills, memory, multi-agent coordination with auto-compaction for multi-day sessions. | ✅ | Capability (personal agent + multi-channel) |
Frameworks
General-purpose agent and LLM application frameworks (the app layer, not harnesses per se).
| # | Project | Description | OSS | Simplicity ↔ capability |
|---|---|---|---|---|
| 1 | langgraph | State-machine graphs over LLM steps; checkpointing, human-in-the-loop, and durable execution so workflows survive restarts. | ✅ | Capability (graphs, checkpointing, durable exec) |
| 2 | langchain | Chains, tools, retrievers, and agents; the usual entry point for "add tools to an LLM" in Python/JS. | ✅ | Capability (kitchen-sink ecosystem) |
| 3 | llama-index | Data-centric: indexing, RAG, and query engines; agent abstractions sit on top of your data pipelines. | ✅ | Capability (RAG + agents) |
| 4 | semantic-kernel | Microsoft's plugin and planner layer for LLMs; C#, Python, Java; strong on enterprise auth and orchestration. | ✅ | Capability (enterprise, multi-language) |
| 5 | mastra | TypeScript-first; agents, tools, and workflows with a single runtime and minimal boilerplate. | ⚠️ Elastic-2.0 | Mid (TS-first, minimal boilerplate) |
| 6 | agno | Python agents with memory, knowledge bases, tools, and structured outputs; continues the PhiData-era product line under the Agno name—production apps, evals, and pipelines. | ✅ | Capability (memory, KB, observability) |
| 7 | letta | Python agent runtime with tool use and control flow; lean API; stateful agents with long-horizon memory. | ✅ | Simple (lean API) |
| 8 | langflow | Low-code UI to build and deploy LangChain/LangGraph flows; visual DAG editor and one-click run. | ✅ | Capability (low-code, visual) |
| 9 | rasa | Conversational AI stack (NLU, dialogue, actions); long-standing OSS choice for chat and voice bots. | ✅ | Capability (full stack) |
| 10 | botpress | Visual bot builder and runtime; multi-channel, open-source alternative to commercial bot platforms. | ✅ | Capability (visual builder, multi-channel) |
| 11 | Dify | One-stop LLM app platform: visual workflows, RAG pipeline, 50+ tools, model management; "ship from prototype to prod" in a single UI. | ⚠️ Fair-code | Capability (one-stop platform) |
| 12 | n8n | Fair-code workflow engine with 400+ nodes and native AI nodes; the self-hosted Zapier that actually does agents and LangChain. | ⚠️ Fair-code | Capability (400+ nodes, workflow engine) |
| 13 | AutoGPT | The original autonomous loop: goal in, agent iterates with tools and memory; Forge is the dev framework, Benchmark the eval harness. | ⚠️ Polyform-SU | Capability (autonomous loop, tools, memory) |
| 14 | AIlice | Fully autonomous general-purpose agent; one binary, Docker-ready, for when you want "set goal and walk away" without a framework. | ✅ | Capability (autonomous, one binary) |
| 15 | Bee Agent Framework | Python + TypeScript, LF AI–backed; MCP/ACP, workflows, Requirement Agent; the one that pushes "production multi-agent" without LangChain. | ✅ | Capability (production multi-agent) |
| 16 | agent-squad | AWS-originated orchestrator (now under 2FastLabs): intent classification, streaming, SupervisorAgent; "agent-as-tools" so one agent delegates to a squad. | ✅ | Capability (squad orchestration) |
| 17 | SuperAgentX | Lightweight multi-agent orchestrator with an AGI-angle; minimal surface, docs-first, for teams that want orchestration without the kitchen sink. | ✅ | Simple (minimal surface) |
| 18 | AgentVerse | Task-solving and simulation envs for multi-LLM agents; deploy many agents in custom environments without building infra from scratch. | ✅ | Capability (simulation envs, multi-agent) |
| 19 | R2R | RAG-first: hybrid search, knowledge graphs, multimodal; the framework for "production RAG" when you care more about retrieval than chat UI. | ✅ | Capability (production RAG) |
| 20 | LiteSwarm | Async-only, LiteLLM-backed Python; dynamic agent switching and type-safe context—for devs who want 100+ models without LangGraph's weight. | ✅ | Mid (100+ models, dynamic switching) |
| 21 | AgentStack | Scaffolds full agent projects; plugs in CrewAI, LangGraph, OpenAI Swarm, LlamaStack and wires AgentOps observability from day one. | ✅ | Capability (scaffold, multi-backend) |
| 22 | AgentSilex | ~300 lines of readable agent code on top of LiteLLM; the "I want to see the whole loop" option for learning or minimal production. | ✅ | Simple (~300 LOC) |
| 23 | Flowise | Drag-and-drop LangChain UI; deploy flows without code. The low-code sibling to Langflow, with a different component and hosting story. | ⚠️ Apache+CLA | Capability (low-code, drag-drop) |
| 24 | browser-use | Python layer over Playwright: natural-language goals become browser actions—web-agent loop without hand-rolling MCP or a custom driver for every site. | ✅ | Mid (LLM + browser, Playwright) |
Multi-agent and orchestration
Harnesses and patterns for multi-agent coordination and handoffs.
| # | Project | Description | OSS | Simplicity ↔ capability |
|---|---|---|---|---|
| 1 | openai-agents-python | Handoffs, guardrails, and multi-LLM routing; minimal surface so you own the loop. | ✅ | Simple (minimal surface) |
| 2 | crewAI | Role-based agents (roles, goals, backstories) in Crews; Flows add event-driven and hierarchical control for production. | ✅ | Capability (roles, Flows, production) |
| 3 | autogen | Conversable agents and group chats; code execution and human-in-the-loop; Microsoft origin, AG2 ecosystem. | ✅ CC-BY | Capability (group chat, code exec, AG2) |
| 4 | PraisonAI | Autonomous multi-agent teams with a single entry point; emphasis on minimal config. | ✅ | Mid (single entry, minimal config) |
| 5 | AgentRL | Multitask, multiturn RL for LLM agents; Ray-based scaling, rollout/actor workers—for teams that want to train agents, not just run them. | ✅ | Capability (RL, Ray, train agents) |
Plugins, MCPs, CLI tools
IDE plugins, concrete MCP servers, and CLI tools that give agents tools and context.
| # | Project | Description | OSS | Simplicity ↔ capability |
|---|---|---|---|---|
| 1 | aider | Git-aware CLI pair programmer; edits in-repo, supports multiple models and MCP so agents see version control and tools. | ✅ | Mid (CLI, git-aware, MCP) |
| 2 | agentlog | Persistent decision memory for any project: remember, recall, reflect. Single-file Python CLI that stores decisions as JSONL and uses Claude or Gemini to retrieve and synthesize patterns—Karpathy's LLM Wiki concept as a CLI. |
✅ | Simple (one file, three commands) |
| 3 | claude-mem | Claude Code plugin that captures everything an agent does during a session, AI-compresses it (via claude-agent-sdk), and injects the relevant context into future sessions—session-to-session memory as a drop-in. | ✅ | Capability (session capture + compression) |
| 4 | Better-OpenCodeMCP | MCP server for OpenCode/Crush: async task execution, model bridging (e.g. Claude→Gemini), process pooling. | ✅ | Mid (MCP server, model bridging) |
| 5 | MCP Python SDK | Official SDK to build and consume MCP servers/clients in Python; stdio and SSE transports. | ✅ | Simple (SDK only) |
| 6 | MCP TypeScript SDK | Official MCP implementation for Node/TS; reference for the protocol. | ✅ | Simple (protocol reference) |
| 7 | continue | Open-source IDE extension (VS Code, JetBrains); in-editor completion and chat with local or API models. | ✅ | Capability (IDE extension, multi-editor) |
| 8 | MCP Inspector | GUI to test and debug MCP servers; inspect tools, resources, and prompts. | ✅ | Simple (debug GUI) |
| 9 | github-mcp-server | MCP server for GitHub: repos, issues, PRs, code search; so your agent can "use GitHub" without hand-rolled API glue. | ✅ | Mid (GitHub API surface) |
| 10 | Docker MCP Gateway | Docker's official MCP CLI plugin / gateway; container-aware MCP tooling from Docker (replaces deprecated docker/mcp-servers path). |
✅ | Mid (Docker-aware MCPs) |
| 11 | puppeteer-mcp-server | Browser automation via MCP: tabs, screenshots, forms, JS execution; the one that connects to existing Chrome for dev/debug. | ✅ | Mid (browser automation) |
| 12 | puppeteer-real-browser-mcp | Puppeteer MCP with real-browser and anti-detection; for agents that need to drive sites that block headless. | ❓ | Mid (real browser, anti-detect) |
Evaluation and benchmarking harnesses
Agentic eval systems, reasoning benchmarks, and open agent benchmarks.
| # | Project | Description | OSS | Simplicity ↔ capability |
|---|---|---|---|---|
| 1 | ARC-AGI-2 | ARC Prize task set: grid-based abstraction/reasoning; public and private splits for generalization. | ✅ | Simple (task set) |
| 2 | arc-agi-benchmarking | Runner for ARC-AGI: multi-provider (OpenAI, Anthropic, Gemini, etc.), rate limits, retries, and scoring. | ✅ | Mid (runner, multi-provider) |
| 3 | AgencyBench | Long-horizon agent benchmark: 32 scenarios, 138 tasks, ~1M tokens and ~90 tool calls; Docker sandbox and rubric-based + LLM judges. | ✅ | Capability (32 scenarios, Docker, judges) |
| 4 | TRAIL | Trace reasoning and agentic issue localization; 148 long-context traces, 841 errors, 20+ error types; Hugging Face dataset. | ✅ | Mid (traces, Hugging Face) |
| 5 | AgentBench | ICLR'24 benchmark: agents across AlfWorld, DB, knowledge graphs, OS, webshop; Docker Compose, function-calling interface. | ✅ | Capability (multi-env, Docker Compose) |
| 6 | WebArena | Realistic web env (e.g. e‑commerce, CMS, dev tools); 812 tasks; measures end-to-end web agent success. | ✅ | Capability (812 tasks, web env) |
| 7 | SWE-bench | LMs resolve real GitHub issues; Docker harness, instance IDs; standard for code-agent evals. | ✅ | Capability (real GitHub issues, standard) |
| 8 | SWE-Gym | Training and evaluation for SWE agents and verifiers (ICML 2025). | ✅ | Capability (training + eval, ICML) |
| 9 | swe-smith | Data generation for SWE agents; 50k+ instances across 128 repos; used for SWE-agent-LM training. | ✅ | Capability (50k+ instances, data gen) |
| 10 | SUPER | Agents that set up and run ML/NLP from GitHub repos; 45 expert problems, 152 masked tasks, 602 AutoGen tasks; Docker-based. | ✅ | Capability (ML/NLP repos, Docker) |
| 11 | VitaBench | ICLR'26: 66 tools, real-world apps (delivery, travel, retail); 100 cross-scenario + 300 single-scenario tasks; adopted by Qwen/Seed. | ✅ | Capability (66 tools, cross-scenario) |
| 12 | letta-evals | Eval harness for stateful Letta agents; configurable suites and grading (LLM or rule-based) so you can measure what you ship. | ✅ | Mid (Letta-specific harness) |
| 13 | gaia-agent | Modular runner for the GAIA benchmark (450 real-world assistant questions); multi-agent evaluation without the Inspect AI lock-in. | ✅ | Mid (GAIA runner, modular) |
| 14 | WebVoyager | End-to-end web agent with LMMs: screenshots + actions on real sites; benchmark on 15 sites, GPT-4V for automatic eval. | ✅ | Capability (LMMs, screenshots, 15 sites) |
| 15 | inspect_evals | UK AISI/Arcadia/Vector: GAIA and other evals in Inspect AI; level 1–3, sandboxed, tool-calling solvers. | ✅ | Mid (Inspect AI, UK gov) |
| 16 | inspect_ai | Inspect AI core: composable eval tasks, sandboxes, scorers, and multi-model runs; the framework behind inspect_evals, not just the task bundle. | ✅ | Capability (eval framework, AISI stack) |
| 17 | Agent Lightning | Microsoft's training-oriented harness: optimization loops for agent behavior—when you need to improve policies over rollouts, not only score a fixed prompt. | ✅ | Capability (agent training, Microsoft stack) |
Research and task-specific harnesses
Deep research, document QA, and domain-specific agent loops.
| # | Project | Description | OSS | Simplicity ↔ capability |
|---|---|---|---|---|
| 1 | openagents | Platform for autonomous agents and autopilot-style workflows; decentralized/Nostr-oriented. | ✅ | Capability (platform, decentralized) |
| 2 | multi-scale-agentic-rag-playbook | NVIDIA's playbook: RAG at different scales with LangGraph agents, abstract search, and query routing—reference architecture, not a product. | ✅ | Mid (playbook, reference arch) |
| 3 | Agentic_RAG_System | Ollama + LangChain, FAISS/BM25/RRF retrieval and an agentic reasoning loop; one concrete stack for "RAG that corrects itself." | ❓ | Mid (Ollama + LangChain stack) |
Libraries and SDKs
Lightweight runtimes, tool loops, and provider-agnostic harness primitives.
| # | Project | Description | OSS | Simplicity ↔ capability |
|---|---|---|---|---|
| 1 | pydantic-ai | Type-safe Python agents with Pydantic I/O; multi-provider, MCP, Logfire observability, and human-in-the-loop. | ✅ | Capability (type-safe, MCP, Logfire) |
| 2 | open-harness | TypeScript Agent class on Vercel AI SDK; streaming events, filesystem/bash tools, MCP, and subagent delegation. | ✅ | Capability (streaming, tools, subagents) |
| 3 | vercel/ai | React and Node SDK for streaming, tool calls, and agent-style UIs; provider-agnostic. | ✅ | Mid (React/Node SDK, provider-agnostic) |
| 4 | agent-harness | Thin Python shim to swap OpenAI vs Anthropic agent SDKs behind one interface. | ✅ | Simple (thin shim) |
| 5 | smolagents | Code-as-action agents: model outputs Python executed in sandbox (E2B, Modal, etc.); ~1k LOC core. | ✅ | Mid (code-as-action, ~1k LOC) |
| 6 | Community-curated agent lists | Broader directories: e.g. brandonhimpfen/awesome-ai-agents, axioma-ai-labs/awesome-ai-agent-frameworks, mb-mal/awesome-ai-agents-frameworks—differ by scope and update cadence. | ❓ | Simple (curated lists) |
| 7 | agentic | TypeScript agent stdlib: works with any LLM and the TS AI SDK; few abstractions, so you own the loop and the UI. (archived Feb 2026.) | ✅ | Simple (stdlib, you own loop) |
| 8 | strands-agents | Model-driven Python SDK; decorators for tools, native MCP, multi-agent; "minimal code" without sacrificing provider choice. | ✅ | Mid (decorators, MCP, minimal code) |
| 9 | LiteLLM | One interface to 100+ LLMs; routing, caching, budgets. Not an agent framework—the pipe every agent framework uses. | ✅ | Simple (LLM pipe only) |
| 10 | litellm2 | LiteLLM plus structured Pydantic outputs, budget controls, and agent-style tool loops; OpenRouter-default option. | ✅ | Mid (LiteLLM + tool loops) |
| 11 | openai-agents-js | Official OpenAI Agents SDK for Node/TS: handoffs, guardrails, voice; the JS counterpart to openai-agents-python. | ✅ | Capability (handoffs, guardrails, voice) |
| 12 | agent-framework | LiteLLM-backed Python with dynamic tool registry, query routing, memory, and Streamlit UI; "full-stack agent app" in one repo. | ✅ | Capability (tool registry, routing, Streamlit) |
| 13 | agentic-ai | Agentic AI stdlib for TypeScript; any LLM, any TS AI SDK; another "thin layer so you own the rest" option. | ✅ | Simple (thin layer) |
| 14 | E2B | Firecracker sandboxes for executing agent-generated code; the hosted isolation layer many tool-calling demos use instead of running arbitrary LLM output on your laptop. | ✅ | Mid (sandbox API, code execution) |
| 15 | Daytona | Elastic dev environments for AI-generated code: workspaces, Git, previews—infra harness between "the model wrote a patch" and "it ran in a real machine." | ✅ | Mid (dev env API, isolation) |
Related Resources
- Awesome: Awesome lists on many topics
- OpenAI – Harness engineering: Environment design, intent, feedback loops, repo-as-system-of-record
- Anthropic – Effective harnesses for long-running agents: Session bridging, feature lists, incremental progress, testing
- Aakash Gupta (Medium) – 2026 is agent harnesses: Harness as moat, minimal intervention, progressive disclosure
- LangChain, Anthropic, OpenAI: Official docs for major agent platforms
Contribution
Contributions are welcome. To add or suggest projects:
- Open an issue with the repo URL, category, and a short description.
- Or submit a pull request editing projects.yaml (and optionally README.md).
For contribution guidelines, see CONTRIBUTING.md and the Code of Conduct.
License
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi