Agentic Engineering Handbook

The definitive OpenAI, Anthropic, MCP, Harness, Evals, and Production Agent Systems learning roadmap.

If this repository helps you, consider giving it a ⭐

Why This Repository?

The AI industry has entered the Agentic Era. Building production-grade AI systems now requires mastering agents, tool use, MCP, memory, long-running workflows, coding agents, agent harnesses, evals, and safety — but the knowledge is scattered across OpenAI blogs, Anthropic engineering posts, SDK docs, cookbooks, and research papers.

This repository consolidates 114 official resources into one structured learning roadmap.

The goal: Become a world-class Agentic Engineer.

Learning Roadmap

Phase 1 — Agent Foundations

Build shared vocabulary for workflow vs agent, tool loop, handoff, guardrails.

Read First

#	Title	Vendor
1	Building effective agents	Anthropic
2	New tools for building agents	OpenAI
3	Agents SDK overview	OpenAI

Then Read

Title	Vendor
Orchestrating Agents: Routines and Handoffs	OpenAI
Structured Outputs for Multi-Agent Systems	OpenAI

Build Exercise

Build a customer service/ticket triage agent: router → specialist → evaluator, with all outputs constrained by structured schemas.

Phase 2 — MCP & Tool Ecosystem

Understand MCP server/client, remote vs local, tool loading, approval, connector boundaries.

Read First

#	Title	Vendor
1	Introducing the Model Context Protocol	Anthropic
2	MCP and Connectors	OpenAI
3	Building MCP servers for ChatGPT Apps and API integrations	OpenAI

Then Read

Title	Vendor
Code execution with MCP: Building more efficient agents	Anthropic
Model Context Protocol - Codex	OpenAI
OpenAI Docs MCP	OpenAI

Build Exercise

Build a read-only repo/docs MCP server, then create an eval to verify the agent correctly cites documentation.

Phase 3 — Context, Memory & Skills

Learn to control context window, short/long-term memory, skills/plugins, CLAUDE.md/AGENTS.md.

Read First

#	Title	Vendor
1	Effective context engineering for AI agents	Anthropic
2	Equipping agents for the real world with Agent Skills	Anthropic
3	Building Reliable Agents with Memory and Compaction	OpenAI

Then Read

Title	Vendor
Custom instructions with AGENTS.md - Codex	OpenAI
Best practices for Claude Code	Anthropic
Agent Skills - Codex	OpenAI

Build Exercise

Implement the same task as a Skill/Plugin, then measure accuracy and token cost across three variants: no skill, long prompt, and skill-based.

Phase 4 — Harness & Long-Running Agents

Master agent runtime: event stream, thread, tool execution, state, sandbox, approval, recovery.

Read First

#	Title	Vendor
1	Unrolling the Codex agent loop	OpenAI
2	Unlocking the Codex harness: how we built the App Server	OpenAI
3	Effective harnesses for long-running agents	Anthropic

Then Read

Title	Vendor
The next evolution of the Agents SDK	OpenAI
Harness design for long-running application development	Anthropic
Scaling Managed Agents: Decoupling the brain from the hands	Anthropic

Build Exercise

Build a mini coding harness: plan file, shell tool, apply patch, test gate, event log, and resume capability.

Phase 5 — Coding & Workspace Agents

Compare Codex vs Claude Code product/SDK forms; learn multi-agent, IDE, workspace collaboration.

Read First

#	Title	Vendor
1	Introducing Codex	OpenAI
2	Best practices for Claude Code	Anthropic
3	Enabling Claude Code to work more autonomously	Anthropic

Then Read

Title	Vendor
Introducing the Codex app	OpenAI
Introducing workspace agents in ChatGPT	OpenAI
Apple's Xcode now supports Claude Agent SDK	Anthropic
Building Consistent Workflows with Codex CLI & Agents SDK	OpenAI

Build Exercise

Run both OpenAI/Codex and Claude Code style workflows on the same repo: issue → plan → patch → tests → PR summary.

Phase 6 — Evals, Safety & Production

Build pre/post-launch eval loop, trace loop, safety boundaries, permissions, regression monitoring.

Read First

#	Title	Vendor
1	Demystifying evals for AI agents	Anthropic
2	Testing Agent Skills Systematically with Evals	OpenAI
3	Build an Agent Improvement Loop with Traces, Evals, and Codex	OpenAI

Then Read

Title	Vendor
Running Codex safely at OpenAI	OpenAI
How we contain Claude across products	Anthropic
Evals API Use-case - MCP Evaluation	OpenAI
Measuring AI agent autonomy in practice	Anthropic

Build Exercise

Build a smoke/macro eval suite for your agent: task success rate, tool misuse, prompt injection resistance, latency, cost, and human approval count.

Full Reading Table

Priority guide: P0 = must-read (architectural/conceptual), P1 = highly useful (implementation detail), P2 = optional context (background/releases).

Priority	Title	Vendor	Topic	Key Idea	Date
P0	OpenAI for Developers in 2025	OpenAI	Agents; MCP; Platform	Annual overview: systematic walkthrough of Responses API, Agents SDK, AgentKit, Codex, MCP, Apps SDK, and AGENTS.md.	2025-12-30
P0	New tools for building agents	OpenAI	Agents; Responses API; Tools	Key starting point for OpenAI's agent platform: Responses API, built-in web/file/computer tools, Agents SDK, tracing/observability.	2025-03-11
P0	Introducing AgentKit	OpenAI	Agents; Evals; AgentKit	AgentKit, expanded evals, agent RFT: the official agent toolchain from prototype to production.	2025-10-06
P0	Agents SDK overview	OpenAI	Agents; SDK	Official SDK entry point: concepts and boundaries of agent, tool, handoff, guardrail, and tracing.	Current docs
P0	Orchestrating Agents: Routines and Handoffs	OpenAI	Agents; Handoffs; Orchestration	Classic introduction: how routines, handoffs, and tool calling combine into controllable multi-flow agents.	2024-10-10
P0	Introducing the Model Context Protocol	Anthropic	MCP; Standards	The origin article for MCP: an open standard connecting AI assistants to data, tools, and systems.	2024-11-25
P0	Building effective agents	Anthropic	Agents; Patterns; Frameworks	Essential agent primer: workflow vs agent, prompt/tool/retrieval, orchestrator-worker, evaluator-optimizer patterns.	2024-12-19
P0	New tools and features in the Responses API	OpenAI	MCP; Responses API; Tools	Responses API extended to remote MCP servers, image/code/file tools; see how OpenAI integrates MCP into its runtime.	2025-05-21
P0	MCP and Connectors	OpenAI	MCP; Connectors; Responses API	Official guide to connecting remote MCP servers and connectors; includes approvals and security considerations.	Current docs
P0	Building MCP servers for ChatGPT Apps and API integrations	OpenAI	MCP; ChatGPT Apps; API	Official guide to writing MCP servers: supply tools/knowledge to ChatGPT Apps, deep research, and API integrations.	Current docs
P0	Building a Deep Research MCP Server	OpenAI	MCP; Deep research	Minimal implementation of a search/fetch MCP server for Deep Research.	2025-06-25
P0	Model Context Protocol - Codex	OpenAI	MCP; Codex	How Codex CLI/IDE connects to MCP servers, adding Figma, browser, docs, and internal tool context to agents.	Current docs
P0	Introducing Codex	OpenAI	Agents; Coding; Sandbox	Cloud-based software engineering agent: parallel tasks, repo sandbox, running tests/linters/type checkers, producing auditable evidence.	2025-05-16
P0	Unrolling the Codex agent loop	OpenAI	Harness; Agent loop; Codex	How Codex CLI chains prompt, tool schema, MCP tools, Responses API, and context management into an agent loop.	2026-01-23
P0	Unlocking the Codex harness: how we built the App Server	OpenAI	Harness; Codex App Server; JSON-RPC	Core harness article: Codex core, App Server, JSON-RPC, streaming progress, approval, diff, and thread management.	2026-02-04
P0	From model to agent: Equipping the Responses API with a computer environment	OpenAI	Harness; Responses API; Sandbox	Responses API + shell tool + hosted containers form the agent runtime; essential for understanding the model-to-agent execution environment.	2026-03-10
P0	Harness engineering: leveraging Codex in an agent-first world	OpenAI	Harness; Agent-first engineering	Design product code, tests, CI, docs, and observability to be agent-readable/executable; learn agent-first repo organization.	2026-02-11
P0	The next evolution of the Agents SDK	OpenAI	Harness; Agents SDK; MCP; Skills	Agents SDK harness becomes more complete: memory, sandbox orchestration, Codex-like filesystem tools, MCP, skills, AGENTS.md.	2026-04-15
P0	Building Consistent Workflows with Codex CLI & Agents SDK	OpenAI	MCP; Codex; Agents SDK	Codex CLI as an MCP server integrated with Agents SDK; real multi-agent dev workflow.	2025-10-01
P0	Building Reliable Agents with Memory and Compaction	OpenAI	Memory; Compaction; Reliability	Memory and compaction design for long-context/multi-turn agents.	2026-05-01
P0	Build an Agent Improvement Loop with Traces, Evals, and Codex	OpenAI	Evals; Traces; Self-improvement	Connect traces, evals, and Codex fixes into an agent improvement loop.	2026-05-12
P0	Eval Driven System Design - From Prototype to Production	OpenAI	Evals; Production	Use evals as the driving force for system design; ideal for moving agents from demo to production.	2025-06-02
P0	Testing Agent Skills Systematically with Evals	OpenAI	Evals; Skills; Agents	Systematically test agent skills with evals; establish quality gates before skill release.	2026-01-22
P0	Evals API Use-case - MCP Evaluation	OpenAI	MCP; Evals	Evaluate QA/retrieval capabilities with MCP tools; ideal for building an MCP regression suite.	2025-06-09
P0	Running Codex safely at OpenAI	OpenAI	Safety; Sandbox; Codex	How OpenAI runs Codex internally: sandbox, approvals, network policy, agent-native telemetry.	2026-05-20
P0	Building Governed AI Agents - A Practical Guide to Agentic Scaffolding	OpenAI	Governance; Guardrails; Agents	Governed agent scaffolding: permissions, guardrails, auditing, and organizational policies.	2026-02-23
P0	Macro Evals for Agentic Systems	OpenAI	Evals; Agentic systems	Evaluate agents at the end-to-end/macro level, not just individual step outputs.	2026-05-19
P0	Best practices for Claude Code	Anthropic	Coding agents; Claude Code	Claude Code methodology: verification loop, explore-plan-code, CLAUDE.md, permissions, MCP, subagents, context management.	2025-04-18
P0	How we built our multi-agent research system	Anthropic	Agents; Multi-agent; Research	Claude Research multi-agent architecture: planner + parallel research agents + synthesis; production multi-agent experience.	2025-06-13
P0	Writing effective tools for AI agents - with AI agents	Anthropic	Tools; MCP; Evals	Tool quality determines agent quality: tool descriptions, context budget, eval, and letting Claude optimize its own tools.	2025-09-11
P0	Effective context engineering for AI agents	Anthropic	Context; Agents	Context is the agent's core resource: selection, compression, isolation, persistence, and context pollution control.	2025-09-29
P0	Enabling Claude Code to work more autonomously	Anthropic	Claude Code; Agent SDK; Subagents	Claude Agent SDK, subagents, hooks, background tasks, checkpoints, and other autonomous coding agent capabilities.	2025-09-29
P0	Equipping agents for the real world with Agent Skills	Anthropic	Skills; Agents	Agent Skills as modular capability packages: instructions, resources, scripts — reducing context burden and improving reliability.	2025-10-16
P0	Code execution with MCP: Building more efficient agents	Anthropic	MCP; Code execution; Context	Key article on MCP scale challenges: reduce token overhead with code execution/on-demand tools; learn progressive disclosure.	2025-11-04
P0	Introducing advanced tool use on Claude Developer Platform	Anthropic	Tools; MCP; Advanced tool use	Tool search, deferred loading, programmatic tool calling; solving context pollution from large numbers of MCP tools.	2025-11-24
P0	Effective harnesses for long-running agents	Anthropic	Harness; Long-running agents	Essential harness reading: working across multiple context windows, task logging, external state, agent self-recovery.	2025-11-26
P0	Demystifying evals for AI agents	Anthropic	Evals; Agents	Agent evals are more complex than static evals: multi-turn, tools, state changes, creative solutions, failure taxonomy.	2026-01-09
P0	Measuring AI agent autonomy in practice	Anthropic	Agents; Autonomy; Measurement	Quantify agent autonomy using metrics like task duration and supervision needs; ideal for building autonomy benchmarks.	2026-02-18
P0	Harness design for long-running application development	Anthropic	Harness; Application development	Harness design patterns for delegating long-running app development tasks to agents; compare with OpenAI Codex harness.	2026-03-24
P0	Scaling Managed Agents: Decoupling the brain from the hands	Anthropic	Managed agents; Harness	Decouple the model brain from execution hands/harness, keeping interfaces stable as the harness evolves.	2026-04-08
P0	How we contain Claude across products	Anthropic	Safety; Containment; Agents	Blast radius of powerful agent releases, human-in-the-loop, and containment strategies.	2026-05-25
P1	Structured Outputs for Multi-Agent Systems	OpenAI	Agents; Multi-agent; Structured outputs	Use strict schemas to constrain structured messages and handoffs between multiple agents.	2024-08-06
P1	Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku	Anthropic	Agents; Computer use	Claude computer use beta starting point: the model uses a computer via screenshots and actions.	2024-10-22
P1	Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet	Anthropic	Agents; Coding; Evals	SWE-bench agent scaffolding article: same model performance strongly depends on harness/scaffolding.	2025-01-06
P1	Introducing Operator	OpenAI	Agents; Computer use; Safety	Early product form of browser-based agents: model clicks, types, and executes tasks on web pages, emphasizing user confirmation and safety boundaries.	2025-01-23
P1	Computer-Using Agent	OpenAI	Agents; Computer use	Understand how CUA combines vision, mouse/keyboard actions, and environment feedback into an agent loop; compare with Claude computer use.	2025-01-23
P1	Claude 3.7 Sonnet and Claude Code	Anthropic	Agents; Coding; Claude Code	Early release of Claude Code, marking Claude's entry into the agentic coding tool space.	2025-02-24
P1	The think tool: Enabling Claude to stop and think in complex tool use situations	Anthropic	Tools; Reasoning; Agents	Give the model an explicit think tool in complex tool-use chains; learn tool design for policy-heavy/multi-step decisions.	2025-03-20
P1	Evaluating Agents with Langfuse	OpenAI	Evals; Agents	Observe and evaluate Agents SDK runs with Langfuse; learn tracing/eval workflows.	2025-03-31
P1	Parallel Agents with the OpenAI Agents SDK	OpenAI	Agents; Parallelism; Agents SDK	Parallel agent patterns: decompose tasks, execute in parallel, aggregate results.	2025-05-01
P1	Multi-Agent Portfolio Collaboration with OpenAI Agents SDK	OpenAI	Agents; Multi-agent; Portfolio	Multi-agent collaboration business example: research, analysis, combined output.	2025-05-28
P1	MCP-Powered Agentic Voice Framework	OpenAI	MCP; Voice; Agents	Voice agent + MCP paradigm: real-time interaction, tool extension, task execution.	2025-06-17
P1	Deep Research API with the Agents SDK	OpenAI	Agents; Deep research; Agents SDK	Integrate Deep Research API into Agents SDK workflows.	2025-06-25
P1	Desktop Extensions: One-click MCP server installation for Claude Desktop	Anthropic	MCP; Claude Desktop; Packaging	Package local MCP servers as one-click install extensions; learn MCP distribution/installation/local permission issues.	2025-06-26
P1	Building a Supply-Chain Copilot with OpenAI Agent SDK and Databricks MCP Servers	OpenAI	MCP; Agents; Databricks	Enterprise data platform MCP + Agent SDK business agent example.	2025-07-08
P1	Introducing ChatGPT agent: bridging research and action	OpenAI	Agents; ChatGPT; Computer use	End-user-facing ChatGPT agent: combining research, browser, computer use, file/slide capabilities.	2025-07-17
P1	ChatGPT agent System Card	OpenAI	Agents; Safety; Evals	Learn pre-launch risk classification, evaluation, permissions, human confirmation, and abuse prevention for agent products.	2025-07-17
P1	Context Engineering - Short-Term Memory Management with Sessions	OpenAI	Context; Sessions; Agents	How short-term memory/session state affects agent reliability.	2025-09-09
P1	Introducing upgrades to Codex	OpenAI	Agents; Coding; IDE	Codex evolves from research preview to daily dev tool: CLI, IDE, web/mobile collaboration, and more independent task execution.	2025-09-15
P1	Introducing Claude Sonnet 4.5	Anthropic	Agents; Claude Agent SDK; Computer use	Sonnet 4.5 emphasizes coding, complex agents, computer use, with simultaneous Agent SDK launch.	2025-09-29
P1	Introducing apps in ChatGPT and the new Apps SDK	OpenAI	MCP; Apps; ChatGPT	Apps SDK extends UI and tool server via MCP; entry point for understanding the ChatGPT app / MCP app ecosystem.	2025-10-06
P1	Codex is now generally available	OpenAI	Agents; Coding; Codex SDK	Codex GA, Slack integration, Codex SDK, admin tools; see how coding agents enter enterprise management.	2025-10-06
P1	Using PLANS.md for multi-hour problem solving	OpenAI	Codex; Long-running; Planning	Plan files and cross-context task management for long-running coding agents.	2025-10-07
P1	Beyond permission prompts: making Claude Code more secure and autonomous	Anthropic	Safety; Permissions; Claude Code	From simple permission prompts to fine-grained security policies, reducing autonomous mode risk and interruptions.	2025-10-20
P1	Introducing Aardvark: OpenAI's agentic security researcher	OpenAI	Agents; Security	Security-domain agent form: continuous scanning, issue verification, fix suggestions; later integrated as Codex Security.	2025-10-30
P1	Build a coding agent with GPT 5.1	OpenAI	Agents; Coding	Build a coding agent from scratch: understand file editing, command execution, loops, and verification.	2025-11-13
P1	OpenAI co-founds Agentic AI Foundation	OpenAI	MCP; Standards; AGENTS.md	MCP, AGENTS.md, and agent standards enter the Linux Foundation/AAIF context; understand ecosystem standardization.	2025-12-09
P1	Donating MCP and establishing the Agentic AI Foundation	Anthropic	MCP; Standards; AAIF	Anthropic donates MCP to Linux Foundation/AAIF; read alongside OpenAI's AAIF article.	2025-12-09
P1	Context Engineering for Personalization - Long-Term Memory Notes	OpenAI	Context; Long-term memory; Agents	How long-term memory serves as agent personalization/state management.	2026-01-05
P1	Supercharging Codex with JetBrains MCP at Skyscanner	OpenAI	MCP; Codex; IDE	Real IDE/MCP case study: how Codex CLI accesses IDE context and dev tools via JetBrains MCP.	2026-01-11
P1	Designing AI-resistant technical evaluations	Anthropic	Evals; Technical hiring	How strong agents continuously break technical evaluations; relevant to benchmark contamination prevention and eval design.	2026-01-21
P1	Inside OpenAI's in-house data agent	OpenAI	Agents; Data; Memory	Internal data agent case study: memory, Codex, data context, reliability; learn enterprise knowledge/data agents.	2026-01-29
P1	Introducing the Codex app	OpenAI	Agents; Coding; Multi-agent	Desktop command center for agents: multi-threaded/parallel long tasks, project-level agent workflows.	2026-02-02
P1	Apple's Xcode now supports Claude Agent SDK	Anthropic	Claude Agent SDK; Xcode; MCP	Embed Claude Agent SDK in Xcode: harness, subagents, background tasks, plugins, MCP.	2026-02-03
P1	Quantifying infrastructure noise in agentic coding evals	Anthropic	Evals; Coding agents; Infrastructure	Environment configuration significantly impacts scores in agentic coding evals; control infrastructure noise in both production and benchmarks.	2026-02-05
P1	Building a C compiler with a team of parallel Claudes	Anthropic	Multi-agent; Coding; Long-running	Parallel Claude teams completing large engineering tasks; learn multi-agent division of labor, coordination, and long-running execution.	2026-02-05
P1	Codex Security: now in research preview	OpenAI	Agents; Security; Codex	Productization of an agentic security researcher: vulnerability discovery, verification, fix suggestions, reducing triage noise.	2026-03-06
P1	Eval awareness in Claude Opus 4.6's BrowseComp performance	Anthropic	Evals; Agent awareness	Risk of models recognizing/adapting to evaluations; relevant to agent benchmark credibility discussions.	2026-03-06
P1	How we built Claude Code auto mode: a safer way to skip permissions	Anthropic	Safety; Permissions; Autonomy	Claude Code auto mode risk classification, allow/block rules, exception handling, and security testing.	2026-03-25
P1	Migrate a Legacy Codebase with Sandbox Agents	OpenAI	Agents; Sandbox; Evals	Sandbox agent evaluation and execution patterns in large legacy code migrations.	2026-04-07
P1	Codex for (almost) everything	OpenAI	Agents; Codex; MCP; Plugins	Codex app expanded to Windows/macOS, computer use, in-app browser, memory, plugins, MCP servers.	2026-04-16
P1	Computer Use Agents in Daytona Sandboxes	OpenAI	Computer use; Sandbox; Agents	Computer-use agents and sandbox runtimes; compare with Operator/CUA/Claude computer use.	2026-04-19
P1	Introducing workspace agents in ChatGPT	OpenAI	Agents; Workspace; Governance	Workspace agents: shared agents, permissions, tools, memory, safeguards; ideal for team collaboration agent design.	2026-04-22
P1	Building workspace agents in ChatGPT to complete repeatable, end-to-end work	OpenAI	Workspace agents; ChatGPT	Practical workspace agents for repeatable end-to-end team workflows.	2026-04-22
P1	Speeding up agentic workflows with WebSockets in the Responses API	OpenAI	Agents; Latency; Responses API	Optimize latency by treating agentic rollouts as long-lived connections/tasks; learn production agent transport and caching.	2026-05-01
P1	Agents for financial services	Anthropic	Agents; Finance; MCP	Ten ready-to-run agent templates, Claude Code/Cowork plugins, Managed Agents cookbooks, MCP app.	2026-05-05
P1	Migrate from the Claude Agent SDK to the OpenAI Agents SDK	OpenAI	Agents SDK; Migration	Compare Claude Agent SDK and OpenAI Agents SDK from a migration perspective; ideal for dual-stack learning.	2026-05-07
P1	Building a safe, effective sandbox to enable Codex on Windows	OpenAI	Safety; Sandbox; Codex	Coding agent sandbox design on Windows: file access, network restrictions, approval tradeoffs.	2026-05-13
P1	Building self-improving tax agents with Codex	OpenAI	Agents; Evals; Self-improvement	Combine production traces, expert feedback, Codex loop, and eval infrastructure into self-improving business agents.	2026-05-27
P1	SchemaFlow: Agentic Database Change Impact Analysis, SQL Generation, and Eval Guardrails	OpenAI	Evals; SQL; Agent guardrails	Guardrails and eval guardrails examples for data/SQL agents.	2026-06-05
P1	Agents SDK quickstart	OpenAI	Agents; SDK	Quickly build a minimal agent; understand the code patterns of run, tool, and handoff.	Current docs
P1	MCP Apps compatibility in ChatGPT	OpenAI	MCP; Apps SDK; UI	Understand MCP Apps UI standards, iframe/bridge, and compatibility between ChatGPT and other hosts.	Current docs
P1	Use Codex with the Agents SDK	OpenAI	MCP; Codex; Agents SDK	Use Codex as an MCP server for other agents to call; ideal for multi-agent dev workflows.	Current docs
P1	Agent approvals and security - Codex	OpenAI	Safety; Approvals; Codex	Official reference for Codex approval modes, sandbox, network access; read alongside OpenAI/Anthropic safety articles.	Current docs
P1	Agent Skills - Codex	OpenAI	Codex; Skills; Plugins	Skills/Plugins as reusable workflow packages; compare with Anthropic Agent Skills.	Current docs
P1	Custom instructions with AGENTS.md - Codex	OpenAI	AGENTS.md; Context	How AGENTS.md provides persistent project specifications for agents; establish repo-level agent contracts.	Current docs
P1	Agents SDK integrations and observability	OpenAI	Observability; MCP; Tracing	Tracing, MCP integration, provider/observability; essential for production agent debugging.	Current docs
P1	Secure MCP Tunnel	OpenAI	MCP; Security; Private tools	Securely expose private/intranet MCP servers to supported OpenAI surfaces; ideal for enterprise deployment.	Current docs
P1	How Claude Code works	Anthropic	Claude Code; Agentic loop; Harness	Under-the-hood architecture of Claude Code: the agentic loop (gather context → act → verify), built-in tool categories, context window management, and extension points.	Current docs
P2	Introducing Contextual Retrieval	Anthropic	Context; Retrieval; RAG	Not agent-specific, but important for agent RAG/context: prepend context to chunks before retrieval to improve recall.	2024-09-19
P2	Developing a computer use model	Anthropic	Computer use; Agents	More technical explanation of how the computer-use model moves the mouse, clicks, types, and reads screen feedback.	2024-10-22
P2	Introducing Claude 4	Anthropic	Agents; Coding; Long-running	Overview of Claude Opus/Sonnet 4 capabilities: coding, advanced reasoning, agent workflows.	2025-05-22
P2	Claude for Financial Services	Anthropic	Agents; Connectors; Finance	Vertical industry agent/connector productization case; understand data, permissions, and tool integration in finance.	2025-07-15
P2	Advancing Claude for Financial Services	Anthropic	Agents; Skills; Finance	Claude for Excel, real-time data connectors, pre-built Agent Skills for vertical industry productization.	2025-10-27
P2	Introducing GPT-5.3-Codex	OpenAI	Agents; Coding model; Evals	Codex-native model and long-running coding/terminal/agentic benchmarks; understand how model capabilities serve the harness.	2026-02-05
P2	Introducing OpenAI Frontier	OpenAI	Agents; Enterprise; Governance	Enterprise AI coworker/agent platform: shared context, onboarding, permissions, guardrails, governance.	2026-02-10
P2	Introducing Claude Sonnet 4.6	Anthropic	Agents; Planning; Computer use	Sonnet 4.6 emphasizes coding, computer use, long-context reasoning, agent planning.	2026-02-17
P2	Introducing Claude Opus 4.6	Anthropic	Agents; Long-running; Tool use	Model release perspective on long-running tasks, agentic harness, subagents, and tool call capabilities.	2026-02-25
P2	Introducing Claude Opus 4.7	Anthropic	Agents; Long-running; Coding	Stronger software engineering and long-running task performance; track how model capabilities impact agent workloads.	2026-04-16
P2	An update on recent Claude Code quality reports	Anthropic	Reliability; Claude Code; Agent SDK	Postmortem on Claude Code/Agent SDK quality regression; learn agent product operations and regression control.	2026-04-23
P2	Introducing Claude Opus 4.8	Anthropic	Agents; Dynamic workflows; Long-running	Dynamic workflows, hundreds of parallel subagents, long-running agentic tasks — latest model/product direction.	2026-05-28
P2	Codex for every role, tool, and workflow	OpenAI	Agents; Codex; Plugins	Codex expands from development to knowledge work: role-specific plugins, Sites, annotations, parallel workflows.	2026-06-02
P2	Codex is becoming a productivity tool for everyone	OpenAI	Agents; Knowledge work	Usage data shows how non-developers use Codex for reports, spreadsheets, research, automation, and lightweight tools.	2026-06-02
P2	OpenAI Docs MCP	OpenAI	MCP; Docs; Context	Official OpenAI docs MCP server; connect docs directly to local agents/IDEs.	Current docs
P2	Codex SDK	OpenAI	Codex SDK; Automation	Programmatically control Codex in CI/CD or internal tools; embed coding agents into existing workflows.	Current docs
P2	When AI builds itself	Anthropic	Agents; Recursive self-improvement; Safety	How AI systems accelerate their own development through recursive self-improvement; three possible futures and the need for verifiable coordination.	2026-05

Who Is This For?

AI Engineers
Agent Engineers
LLM Engineers
Platform Engineers
Research Engineers
AI Startup Founders

Contributing

Contributions are welcome. If you find:

New OpenAI resources
New Anthropic resources
MCP updates
Agent evaluation frameworks
Production engineering articles

Please open a pull request.

Vision

The goal of this project is to become the System Design Primer for Agentic Engineering.

If you're serious about building production AI agents, start here.

Star History

License

MIT

agentic-engineering-handbook

Agentic Engineering Handbook

Why This Repository?

Learning Roadmap

Phase 1 — Agent Foundations

Read First

Then Read

Build Exercise

Phase 2 — MCP & Tool Ecosystem

Read First

Then Read

Build Exercise

Phase 3 — Context, Memory & Skills

Read First

Then Read

Build Exercise

Phase 4 — Harness & Long-Running Agents

Read First

Then Read

Build Exercise

Phase 5 — Coding & Workspace Agents

Read First

Then Read

Build Exercise

Phase 6 — Evals, Safety & Production

Read First

Then Read

Build Exercise

Full Reading Table

Who Is This For?

Contributing

Vision

Star History

License

Yorumlar (0)