Agent Engineering Roadmap

A hands-on roadmap for building production-ready AI Agents, MCP Servers, Memory Systems, Multi-Agent Workflows, and Agent Colonies.

繁體中文 · Website · Course · Roadmap · Examples · Showcases · Benchmarks · Labs · Teaching · Templates · Architecture · Healthcare · Finance

Agent Engineering Course Map

Quickstart

Step	Do This	Why
1	Run `python scripts/verify_examples.py`	Confirm every dependency-free example works locally
2	Study Course	Follow the roadmap in the intended learning order
3	Build Capstone Starter	Turn the lessons into a runnable agent colony project

flowchart LR
    User((User)) --> Agent[AI Agent]
    Agent --> Tools[Tool Use]
    Tools --> MCP[MCP Layer]
    MCP --> Memory[Memory System]
    Memory --> Workflow[Agent Workflow]
    Workflow --> MultiAgent[Multi-Agent Team]
    MultiAgent --> Colony[Agent Colony]
    Colony --> Production[Production AI App]

Why this roadmap exists

Most AI tutorials stop at prompts, RAG, or simple tool calling.

Real agentic products require more than that:

agents that can use tools safely
MCP servers that connect agents to real systems
memory layers that persist useful context
workflows that are observable and controllable
multi-agent teams that can specialize and collaborate
evaluation, security, and production guardrails

This repository is a practical learning path for builders who want to move from chatbot demos to real agent engineering.

Teaching approach

This roadmap teaches agents like an engineering course, not a tool catalog.

Each major topic follows the same pattern:

Start with the problem: what breaks if you only use a chatbot?
Build the intuition: what is the simplest mental model?
Open the box: what components are actually involved?
Run a minimal example: what can you inspect locally?
Add production judgment: what needs evaluation, observability, approval, or safety gates?

In one sentence: an agent is not magic. It is context, tools, memory, workflow, evaluation, and human judgment arranged around a useful task.

What you will learn

Level	Topic	Outcome
0	AI & LLM Fundamentals	Understand LLM apps, embeddings, RAG, and structured output
1	Single Agent	Build a task-focused agent with a clear role and output format
2	Tool Use	Connect agents to external tools and APIs
3	MCP	Build and use MCP clients, servers, tools, resources, and prompts
4	Agent Memory	Design short-term, episodic, semantic, user, and shared memory
5	Agent Workflow	Build reliable planning, execution, review, retry, and approval flows
6	Multi-Agent Systems	Coordinate specialized agents using supervisor, debate, and reflection patterns
7	Agent Colony	Build shared-memory colonies with domain agents and evaluation loops
8	Production & Safety	Deploy agents with observability, evaluation, security, and cost control

Course materials

Section	Purpose
Course	Complete syllabus and graduation criteria
Curriculum	Concept chapters from foundations to production
Visual Assets	SVG diagrams for teaching and slides
Roadmap	Level-by-level learning milestones
Examples	Runnable minimal implementations
Benchmarks	Lightweight checks for tool use, RAG, workflow, security, and observability
Showcases	Dependency-free demos for healthcare, finance, and enterprise workflows
Domain Casebooks	Healthcare, finance, and enterprise case studies with eval cases
Labs	Guided exercises for each stage
Teaching Layer	Teaching audit, misconceptions, deliverables, and module blueprint
Lab Solution Guides	Solution shapes and grading direction for hands-on labs
Lesson Plans	Instructor-ready teaching plans for each module
Study Group Kit	4-week, 8-week, and workshop formats for cohorts
Patterns	Reusable agent architecture patterns
Templates	Agent specs, memory policies, evals, and safety gates
Papers	Research papers, reading roadmap, and engineering notes
Open Source Projects	Curated ecosystem map for frameworks, MCP, RAG, evals, observability, and ops
Framework Selection Matrix	Choose agent frameworks by engineering tradeoff
Open Source Reading Guide	Learn how to study real agent repositories
DeepEval And RAGAS	Practical guide to LLM and RAG evaluation frameworks
Release Checklist	v1 release verification and project hygiene
Assessments	Quiz bank and rubrics
Capstone	Final project for building a production-aware colony
Portfolio Projects	Project ideas with deliverables, evals, and open-source references
Capstone Starter	Runnable starter scaffold for the final project
Glossary	Core terms and definitions

The learning path

AI Fundamentals
      ↓
Single Agent
      ↓
Tool Use
      ↓
MCP Integration
      ↓
Agent Memory
      ↓
Agent Workflow
      ↓
Multi-Agent Systems
      ↓
Agent Colony
      ↓
Production, Evaluation & Safety

Try it in 60 seconds

Run a showcase without API keys:

python showcases/enterprise-support-agent/main.py
python showcases/finance-research-agent/main.py
python showcases/healthcare-agent-colony/main.py

Then run the evaluation harness:

python examples/07-evaluation-harness/main.py
python examples/08-mini-rag/main.py
python benchmarks/benchmark_runner.py
python scripts/verify_examples.py

Production readiness artifacts

Artifact	Use
Agent Registry Template	Register owner, scopes, tools, data, evals, and operations
Risk Assessment Template	Classify agent risk before launch
Deployment Review Template	Check release gates and operational readiness
Release Checklist	Prepare a public course release
v1.0 Readiness	Track stable release readiness

Showcase demos

Demo	Shows
Enterprise Support Agent	Ticket routing, risk classification, approval gates
Finance Research Agent	Research support, assumptions, risk boundaries
Healthcare Agent Colony	Safety boundaries, escalation, medical-advice avoidance

Runnable examples

Example	Shows	No API key
01 Single Agent	Role, task boundary, structured output	Yes
02 Tool-Using Agent	Local tool call and validation	Yes
03 MCP-style Agent	Client/server tool boundary	Yes
04 Memory Agent	Memory write/retrieve policy	Yes
05 Multi-Agent Workflow	Planner, researcher, writer, reviewer	Yes
06 Agent Colony	Supervisor, domain agent, evaluator	Yes
07 Evaluation Harness	Regression eval suite	Yes
08 Mini RAG	Retrieval, grounded answer, RAG eval	Yes
09 Graph Approval Agent	Graph transitions, approval gate, production eval	Yes
10 Observable Agent	Trace events, guardrail logs, replayable debugging	Yes
11 Prompt Injection Defense	Untrusted retrieval filtering and security eval	Yes
12 Cost-Aware Agent	Model routing, budget, latency, fallback eval	Yes
13 Durable Workflow Agent	Checkpoint, resume, durable workflow eval	Yes
14 Modern MCP Gateway	Tools, resources, prompts, auth, elicitation	Yes
15 Memory Governance Agent	Memory redaction, merge, decay, deletion, audit	Yes
16 Agent Permission System	Agent identity, scopes, access review, audit	Yes
17 Advanced Eval Harness	Regression, safety, adversarial, golden trace release gate	Yes
Capstone Starter	Starter colony demo and regression eval	Yes

Run every dependency-free example with:

python scripts/verify_examples.py

README widgets used

This README uses lightweight visual widgets commonly seen in popular GitHub projects:

Local cover image for the top hero banner
shields.io for stars, forks, language, status, and topic badges
Mermaid for architecture diagrams

Plugin ecosystem

Agent Engineering is not only about prompts. A production agent needs a plugin ecosystem around it.

Category	Purpose	Example Plugins / Tools
MCP Servers	Standardized access to tools and data	filesystem, database, browser, GitHub, Slack, Google Drive
Memory	Persistent context and retrieval	Qdrant, LanceDB, Chroma, PostgreSQL, Redis
Orchestration	Workflow and multi-agent control	LangGraph, CrewAI, AutoGen, OpenAI Agents SDK
RAG	Knowledge retrieval and grounding	LlamaIndex, LangChain, Haystack
Observability	Tracing, debugging, monitoring	Langfuse, OpenTelemetry, Helicone, Phoenix
Evaluation	Quality and safety testing	DeepEval, RAGAS, promptfoo, custom eval suites
Guardrails	Safety and structured validation	Guardrails AI, Pydantic, JSON Schema, policy checkers
UI / App Layer	User-facing agent applications	Streamlit, Gradio, Next.js, FastAPI
Domain Tools	Industry-specific integrations	healthcare records, finance data, CRM, ERP, ticketing systems

Core architecture

graph TD
    User[User] --> Supervisor[Supervisor Agent]
    Supervisor --> Planner[Planner]
    Planner --> MemoryAgent[Memory Agent]
    Planner --> ResearchAgent[Research Agent]
    Planner --> ToolAgent[Tool Agent]
    Planner --> DomainAgent[Domain Agent]
    MemoryAgent --> SharedMemory[Shared Memory]
    ToolAgent --> MCP[MCP Servers]
    DomainAgent --> MCP
    ResearchAgent --> MCP
    MCP --> PluginLayer[Plugin Ecosystem]
    PluginLayer --> Databases[Databases]
    PluginLayer --> Documents[Documents]
    PluginLayer --> APIs[External APIs]
    PluginLayer --> SaaS[SaaS Apps]
    Supervisor --> Evaluator[Evaluator Agent]
    Evaluator --> Final[Final Response]
    Final --> User
    Evaluator --> SharedMemory

Repository structure

agent-engineering-roadmap/
├── README.md
├── README_zh.md
├── COURSE.md
├── assets/           # Visual diagrams and teaching images
├── roadmap/          # Level 0-8 learning path
├── curriculum/       # Full course chapters
├── examples/         # Hands-on examples
├── benchmarks/       # Lightweight behavior checks
├── security/         # Prompt injection and agent security labs
├── study-groups/     # Cohort and workshop facilitation kit
├── showcases/        # Shareable demos with sample outputs
├── labs/             # Guided exercises
├── lesson-plans/     # Instructor-ready lesson plans
├── patterns/         # Architecture pattern catalog
├── architecture/     # System design patterns
├── templates/        # Reusable agent and MCP templates
├── assessments/      # Quiz bank and rubrics
├── projects/         # Capstone and portfolio projects
├── glossary/         # Agent engineering terms
├── healthcare/       # Healthcare agent engineering track
├── finance/          # Finance and quantitative research track
├── resources/        # Curated learning resources
├── docs/             # GitHub Pages site
└── launch-kit/       # Launch copy, topics, and checklist

Real-world tracks

Healthcare Agent Engineering

Build agent systems for care management, nutrition tracking, personal health memory, and healthcare workflow automation.

Example colony:

Care Manager Agent
├── Nutrition Agent
├── Vital Sign Agent
├── Psychology Agent
├── Medication Agent
├── Memory Agent
└── Safety Evaluator Agent

Finance Agent Engineering

Build research agents, factor-analysis agents, portfolio agents, risk agents, and trading research workflows.