agent-engineering-roadmap
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 7 GitHub stars
Code Gecti
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Bilingual hands-on roadmap for production-aware AI agents: MCP, memory, RAG, workflows, evaluation, safety, and agent colonies.
Agent Engineering Roadmap
A hands-on roadmap for building production-ready AI Agents, MCP Servers, Memory Systems, Multi-Agent Workflows, and Agent Colonies.
繁體中文 · Website · Course · Roadmap · Examples · Showcases · Benchmarks · Labs · Teaching · Templates · Architecture · Healthcare · Finance
Quickstart
| Step | Do This | Why |
|---|---|---|
| 1 | Run python scripts/verify_examples.py |
Confirm every dependency-free example works locally |
| 2 | Study Course | Follow the roadmap in the intended learning order |
| 3 | Build Capstone Starter | Turn the lessons into a runnable agent colony project |
flowchart LR
User((User)) --> Agent[AI Agent]
Agent --> Tools[Tool Use]
Tools --> MCP[MCP Layer]
MCP --> Memory[Memory System]
Memory --> Workflow[Agent Workflow]
Workflow --> MultiAgent[Multi-Agent Team]
MultiAgent --> Colony[Agent Colony]
Colony --> Production[Production AI App]
Why this roadmap exists
Most AI tutorials stop at prompts, RAG, or simple tool calling.
Real agentic products require more than that:
- agents that can use tools safely
- MCP servers that connect agents to real systems
- memory layers that persist useful context
- workflows that are observable and controllable
- multi-agent teams that can specialize and collaborate
- evaluation, security, and production guardrails
This repository is a practical learning path for builders who want to move from chatbot demos to real agent engineering.
Teaching approach
This roadmap teaches agents like an engineering course, not a tool catalog.
Each major topic follows the same pattern:
- Start with the problem: what breaks if you only use a chatbot?
- Build the intuition: what is the simplest mental model?
- Open the box: what components are actually involved?
- Run a minimal example: what can you inspect locally?
- Add production judgment: what needs evaluation, observability, approval, or safety gates?
In one sentence: an agent is not magic. It is context, tools, memory, workflow, evaluation, and human judgment arranged around a useful task.
What you will learn
| Level | Topic | Outcome |
|---|---|---|
| 0 | AI & LLM Fundamentals | Understand LLM apps, embeddings, RAG, and structured output |
| 1 | Single Agent | Build a task-focused agent with a clear role and output format |
| 2 | Tool Use | Connect agents to external tools and APIs |
| 3 | MCP | Build and use MCP clients, servers, tools, resources, and prompts |
| 4 | Agent Memory | Design short-term, episodic, semantic, user, and shared memory |
| 5 | Agent Workflow | Build reliable planning, execution, review, retry, and approval flows |
| 6 | Multi-Agent Systems | Coordinate specialized agents using supervisor, debate, and reflection patterns |
| 7 | Agent Colony | Build shared-memory colonies with domain agents and evaluation loops |
| 8 | Production & Safety | Deploy agents with observability, evaluation, security, and cost control |
Course materials
| Section | Purpose |
|---|---|
| Course | Complete syllabus and graduation criteria |
| Curriculum | Concept chapters from foundations to production |
| Visual Assets | SVG diagrams for teaching and slides |
| Roadmap | Level-by-level learning milestones |
| Examples | Runnable minimal implementations |
| Benchmarks | Lightweight checks for tool use, RAG, workflow, security, and observability |
| Showcases | Dependency-free demos for healthcare, finance, and enterprise workflows |
| Domain Casebooks | Healthcare, finance, and enterprise case studies with eval cases |
| Labs | Guided exercises for each stage |
| Teaching Layer | Teaching audit, misconceptions, deliverables, and module blueprint |
| Lab Solution Guides | Solution shapes and grading direction for hands-on labs |
| Lesson Plans | Instructor-ready teaching plans for each module |
| Study Group Kit | 4-week, 8-week, and workshop formats for cohorts |
| Patterns | Reusable agent architecture patterns |
| Templates | Agent specs, memory policies, evals, and safety gates |
| Papers | Research papers, reading roadmap, and engineering notes |
| Open Source Projects | Curated ecosystem map for frameworks, MCP, RAG, evals, observability, and ops |
| Framework Selection Matrix | Choose agent frameworks by engineering tradeoff |
| Open Source Reading Guide | Learn how to study real agent repositories |
| DeepEval And RAGAS | Practical guide to LLM and RAG evaluation frameworks |
| Release Checklist | v1 release verification and project hygiene |
| Assessments | Quiz bank and rubrics |
| Capstone | Final project for building a production-aware colony |
| Portfolio Projects | Project ideas with deliverables, evals, and open-source references |
| Capstone Starter | Runnable starter scaffold for the final project |
| Glossary | Core terms and definitions |
The learning path
AI Fundamentals
↓
Single Agent
↓
Tool Use
↓
MCP Integration
↓
Agent Memory
↓
Agent Workflow
↓
Multi-Agent Systems
↓
Agent Colony
↓
Production, Evaluation & Safety
Try it in 60 seconds
Run a showcase without API keys:
python showcases/enterprise-support-agent/main.py
python showcases/finance-research-agent/main.py
python showcases/healthcare-agent-colony/main.py
Then run the evaluation harness:
python examples/07-evaluation-harness/main.py
python examples/08-mini-rag/main.py
python benchmarks/benchmark_runner.py
python scripts/verify_examples.py
Production readiness artifacts
| Artifact | Use |
|---|---|
| Agent Registry Template | Register owner, scopes, tools, data, evals, and operations |
| Risk Assessment Template | Classify agent risk before launch |
| Deployment Review Template | Check release gates and operational readiness |
| Release Checklist | Prepare a public course release |
| v1.0 Readiness | Track stable release readiness |
Showcase demos
| Demo | Shows |
|---|---|
| Enterprise Support Agent | Ticket routing, risk classification, approval gates |
| Finance Research Agent | Research support, assumptions, risk boundaries |
| Healthcare Agent Colony | Safety boundaries, escalation, medical-advice avoidance |
Runnable examples
| Example | Shows | No API key |
|---|---|---|
| 01 Single Agent | Role, task boundary, structured output | Yes |
| 02 Tool-Using Agent | Local tool call and validation | Yes |
| 03 MCP-style Agent | Client/server tool boundary | Yes |
| 04 Memory Agent | Memory write/retrieve policy | Yes |
| 05 Multi-Agent Workflow | Planner, researcher, writer, reviewer | Yes |
| 06 Agent Colony | Supervisor, domain agent, evaluator | Yes |
| 07 Evaluation Harness | Regression eval suite | Yes |
| 08 Mini RAG | Retrieval, grounded answer, RAG eval | Yes |
| 09 Graph Approval Agent | Graph transitions, approval gate, production eval | Yes |
| 10 Observable Agent | Trace events, guardrail logs, replayable debugging | Yes |
| 11 Prompt Injection Defense | Untrusted retrieval filtering and security eval | Yes |
| 12 Cost-Aware Agent | Model routing, budget, latency, fallback eval | Yes |
| 13 Durable Workflow Agent | Checkpoint, resume, durable workflow eval | Yes |
| 14 Modern MCP Gateway | Tools, resources, prompts, auth, elicitation | Yes |
| 15 Memory Governance Agent | Memory redaction, merge, decay, deletion, audit | Yes |
| 16 Agent Permission System | Agent identity, scopes, access review, audit | Yes |
| 17 Advanced Eval Harness | Regression, safety, adversarial, golden trace release gate | Yes |
| Capstone Starter | Starter colony demo and regression eval | Yes |
Run every dependency-free example with:
python scripts/verify_examples.py
README widgets used
This README uses lightweight visual widgets commonly seen in popular GitHub projects:
- Local cover image for the top hero banner
shields.iofor stars, forks, language, status, and topic badges- Mermaid for architecture diagrams
Plugin ecosystem
Agent Engineering is not only about prompts. A production agent needs a plugin ecosystem around it.
| Category | Purpose | Example Plugins / Tools |
|---|---|---|
| MCP Servers | Standardized access to tools and data | filesystem, database, browser, GitHub, Slack, Google Drive |
| Memory | Persistent context and retrieval | Qdrant, LanceDB, Chroma, PostgreSQL, Redis |
| Orchestration | Workflow and multi-agent control | LangGraph, CrewAI, AutoGen, OpenAI Agents SDK |
| RAG | Knowledge retrieval and grounding | LlamaIndex, LangChain, Haystack |
| Observability | Tracing, debugging, monitoring | Langfuse, OpenTelemetry, Helicone, Phoenix |
| Evaluation | Quality and safety testing | DeepEval, RAGAS, promptfoo, custom eval suites |
| Guardrails | Safety and structured validation | Guardrails AI, Pydantic, JSON Schema, policy checkers |
| UI / App Layer | User-facing agent applications | Streamlit, Gradio, Next.js, FastAPI |
| Domain Tools | Industry-specific integrations | healthcare records, finance data, CRM, ERP, ticketing systems |
Core architecture
graph TD
User[User] --> Supervisor[Supervisor Agent]
Supervisor --> Planner[Planner]
Planner --> MemoryAgent[Memory Agent]
Planner --> ResearchAgent[Research Agent]
Planner --> ToolAgent[Tool Agent]
Planner --> DomainAgent[Domain Agent]
MemoryAgent --> SharedMemory[Shared Memory]
ToolAgent --> MCP[MCP Servers]
DomainAgent --> MCP
ResearchAgent --> MCP
MCP --> PluginLayer[Plugin Ecosystem]
PluginLayer --> Databases[Databases]
PluginLayer --> Documents[Documents]
PluginLayer --> APIs[External APIs]
PluginLayer --> SaaS[SaaS Apps]
Supervisor --> Evaluator[Evaluator Agent]
Evaluator --> Final[Final Response]
Final --> User
Evaluator --> SharedMemory
Repository structure
agent-engineering-roadmap/
├── README.md
├── README_zh.md
├── COURSE.md
├── assets/ # Visual diagrams and teaching images
├── roadmap/ # Level 0-8 learning path
├── curriculum/ # Full course chapters
├── examples/ # Hands-on examples
├── benchmarks/ # Lightweight behavior checks
├── security/ # Prompt injection and agent security labs
├── study-groups/ # Cohort and workshop facilitation kit
├── showcases/ # Shareable demos with sample outputs
├── labs/ # Guided exercises
├── lesson-plans/ # Instructor-ready lesson plans
├── patterns/ # Architecture pattern catalog
├── architecture/ # System design patterns
├── templates/ # Reusable agent and MCP templates
├── assessments/ # Quiz bank and rubrics
├── projects/ # Capstone and portfolio projects
├── glossary/ # Agent engineering terms
├── healthcare/ # Healthcare agent engineering track
├── finance/ # Finance and quantitative research track
├── resources/ # Curated learning resources
├── docs/ # GitHub Pages site
└── launch-kit/ # Launch copy, topics, and checklist
Real-world tracks
Healthcare Agent Engineering
Build agent systems for care management, nutrition tracking, personal health memory, and healthcare workflow automation.
Example colony:
Care Manager Agent
├── Nutrition Agent
├── Vital Sign Agent
├── Psychology Agent
├── Medication Agent
├── Memory Agent
└── Safety Evaluator Agent
Finance Agent Engineering
Build research agents, factor-analysis agents, portfolio agents, risk agents, and trading research workflows.
Example colony:
Research Agent
├── Market Data Agent
├── Factor Analysis Agent
├── Portfolio Agent
├── Risk Agent
└── Report Agent
Enterprise Agent Engineering
Build customer support agents, internal knowledge agents, document agents, workflow automation agents, and evaluation pipelines.
Design principles
- Agents should be useful before they are autonomous.
- Memory should be intentional, auditable, and safe.
- MCP should be treated as an integration layer, not just a plugin mechanism.
- Multi-agent systems should reduce complexity for users, not create complexity for developers.
- Production agents need evaluation, observability, cost control, and human approval gates.
Project roadmap
- Initialize bilingual repository structure
- Add Level 0-8 roadmap skeleton
- Add architecture documents
- Add healthcare and finance tracks
- Add README badges and hero banner
- Expand each roadmap level into handbook chapters
- Add minimal runnable examples
- Add MCP server templates
- Add memory system examples
- Add agent colony demo
- Add evaluation and safety templates
- Add full course syllabus
- Add observable agent and prompt injection defense examples
- Add benchmark runner and study group kit
- Add cost, durable runtime, and modern MCP gateway modules
- Add memory governance, identity permission, and incident response modules
- Add advanced eval, product UX, and enterprise operating model modules
- Add guided labs
- Add instructor-ready lesson plans
- Add pattern catalog
- Add quiz bank, rubrics, glossary, and capstone
- Add full healthcare agent colony application
- Add full finance research agent application
Who this is for
- AI engineers
- LLM application developers
- Startup builders
- Researchers building agent systems
- Product teams moving from chatbot demos to real workflows
- Developers interested in MCP, memory, and multi-agent systems
License
This project is licensed under the MIT License.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi