agent-engineering-roadmap

mcp
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 7 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

Bilingual hands-on roadmap for production-aware AI agents: MCP, memory, RAG, workflows, evaluation, safety, and agent colonies.

README.md

Agent Engineering Roadmap

Agent Engineering Roadmap cover

Traditional Chinese English GitHub Stars GitHub Forks Last Commit License Runnable examples status

Agent Engineering MCP Ready Memory Systems Multi-Agent Workflow Agent Colony Status

A hands-on roadmap for building production-ready AI Agents, MCP Servers, Memory Systems, Multi-Agent Workflows, and Agent Colonies.

繁體中文 · Website · Course · Roadmap · Examples · Showcases · Benchmarks · Labs · Teaching · Templates · Architecture · Healthcare · Finance


Agent Engineering Course Map


Quickstart

Step Do This Why
1 Run python scripts/verify_examples.py Confirm every dependency-free example works locally
2 Study Course Follow the roadmap in the intended learning order
3 Build Capstone Starter Turn the lessons into a runnable agent colony project

flowchart LR
    User((User)) --> Agent[AI Agent]
    Agent --> Tools[Tool Use]
    Tools --> MCP[MCP Layer]
    MCP --> Memory[Memory System]
    Memory --> Workflow[Agent Workflow]
    Workflow --> MultiAgent[Multi-Agent Team]
    MultiAgent --> Colony[Agent Colony]
    Colony --> Production[Production AI App]

Why this roadmap exists

Most AI tutorials stop at prompts, RAG, or simple tool calling.

Real agentic products require more than that:

  • agents that can use tools safely
  • MCP servers that connect agents to real systems
  • memory layers that persist useful context
  • workflows that are observable and controllable
  • multi-agent teams that can specialize and collaborate
  • evaluation, security, and production guardrails

This repository is a practical learning path for builders who want to move from chatbot demos to real agent engineering.


Teaching approach

This roadmap teaches agents like an engineering course, not a tool catalog.

Each major topic follows the same pattern:

  1. Start with the problem: what breaks if you only use a chatbot?
  2. Build the intuition: what is the simplest mental model?
  3. Open the box: what components are actually involved?
  4. Run a minimal example: what can you inspect locally?
  5. Add production judgment: what needs evaluation, observability, approval, or safety gates?

In one sentence: an agent is not magic. It is context, tools, memory, workflow, evaluation, and human judgment arranged around a useful task.


What you will learn

Level Topic Outcome
0 AI & LLM Fundamentals Understand LLM apps, embeddings, RAG, and structured output
1 Single Agent Build a task-focused agent with a clear role and output format
2 Tool Use Connect agents to external tools and APIs
3 MCP Build and use MCP clients, servers, tools, resources, and prompts
4 Agent Memory Design short-term, episodic, semantic, user, and shared memory
5 Agent Workflow Build reliable planning, execution, review, retry, and approval flows
6 Multi-Agent Systems Coordinate specialized agents using supervisor, debate, and reflection patterns
7 Agent Colony Build shared-memory colonies with domain agents and evaluation loops
8 Production & Safety Deploy agents with observability, evaluation, security, and cost control

Course materials

Section Purpose
Course Complete syllabus and graduation criteria
Curriculum Concept chapters from foundations to production
Visual Assets SVG diagrams for teaching and slides
Roadmap Level-by-level learning milestones
Examples Runnable minimal implementations
Benchmarks Lightweight checks for tool use, RAG, workflow, security, and observability
Showcases Dependency-free demos for healthcare, finance, and enterprise workflows
Domain Casebooks Healthcare, finance, and enterprise case studies with eval cases
Labs Guided exercises for each stage
Teaching Layer Teaching audit, misconceptions, deliverables, and module blueprint
Lab Solution Guides Solution shapes and grading direction for hands-on labs
Lesson Plans Instructor-ready teaching plans for each module
Study Group Kit 4-week, 8-week, and workshop formats for cohorts
Patterns Reusable agent architecture patterns
Templates Agent specs, memory policies, evals, and safety gates
Papers Research papers, reading roadmap, and engineering notes
Open Source Projects Curated ecosystem map for frameworks, MCP, RAG, evals, observability, and ops
Framework Selection Matrix Choose agent frameworks by engineering tradeoff
Open Source Reading Guide Learn how to study real agent repositories
DeepEval And RAGAS Practical guide to LLM and RAG evaluation frameworks
Release Checklist v1 release verification and project hygiene
Assessments Quiz bank and rubrics
Capstone Final project for building a production-aware colony
Portfolio Projects Project ideas with deliverables, evals, and open-source references
Capstone Starter Runnable starter scaffold for the final project
Glossary Core terms and definitions

The learning path

AI Fundamentals
      ↓
Single Agent
      ↓
Tool Use
      ↓
MCP Integration
      ↓
Agent Memory
      ↓
Agent Workflow
      ↓
Multi-Agent Systems
      ↓
Agent Colony
      ↓
Production, Evaluation & Safety

Try it in 60 seconds

Run a showcase without API keys:

python showcases/enterprise-support-agent/main.py
python showcases/finance-research-agent/main.py
python showcases/healthcare-agent-colony/main.py

Then run the evaluation harness:

python examples/07-evaluation-harness/main.py
python examples/08-mini-rag/main.py
python benchmarks/benchmark_runner.py
python scripts/verify_examples.py

Production readiness artifacts

Artifact Use
Agent Registry Template Register owner, scopes, tools, data, evals, and operations
Risk Assessment Template Classify agent risk before launch
Deployment Review Template Check release gates and operational readiness
Release Checklist Prepare a public course release
v1.0 Readiness Track stable release readiness

Showcase demos

Demo Shows
Enterprise Support Agent Ticket routing, risk classification, approval gates
Finance Research Agent Research support, assumptions, risk boundaries
Healthcare Agent Colony Safety boundaries, escalation, medical-advice avoidance

Runnable examples

Example Shows No API key
01 Single Agent Role, task boundary, structured output Yes
02 Tool-Using Agent Local tool call and validation Yes
03 MCP-style Agent Client/server tool boundary Yes
04 Memory Agent Memory write/retrieve policy Yes
05 Multi-Agent Workflow Planner, researcher, writer, reviewer Yes
06 Agent Colony Supervisor, domain agent, evaluator Yes
07 Evaluation Harness Regression eval suite Yes
08 Mini RAG Retrieval, grounded answer, RAG eval Yes
09 Graph Approval Agent Graph transitions, approval gate, production eval Yes
10 Observable Agent Trace events, guardrail logs, replayable debugging Yes
11 Prompt Injection Defense Untrusted retrieval filtering and security eval Yes
12 Cost-Aware Agent Model routing, budget, latency, fallback eval Yes
13 Durable Workflow Agent Checkpoint, resume, durable workflow eval Yes
14 Modern MCP Gateway Tools, resources, prompts, auth, elicitation Yes
15 Memory Governance Agent Memory redaction, merge, decay, deletion, audit Yes
16 Agent Permission System Agent identity, scopes, access review, audit Yes
17 Advanced Eval Harness Regression, safety, adversarial, golden trace release gate Yes
Capstone Starter Starter colony demo and regression eval Yes

Run every dependency-free example with:

python scripts/verify_examples.py

README widgets used

This README uses lightweight visual widgets commonly seen in popular GitHub projects:

  • Local cover image for the top hero banner
  • shields.io for stars, forks, language, status, and topic badges
  • Mermaid for architecture diagrams

Plugin ecosystem

Agent Engineering is not only about prompts. A production agent needs a plugin ecosystem around it.

Category Purpose Example Plugins / Tools
MCP Servers Standardized access to tools and data filesystem, database, browser, GitHub, Slack, Google Drive
Memory Persistent context and retrieval Qdrant, LanceDB, Chroma, PostgreSQL, Redis
Orchestration Workflow and multi-agent control LangGraph, CrewAI, AutoGen, OpenAI Agents SDK
RAG Knowledge retrieval and grounding LlamaIndex, LangChain, Haystack
Observability Tracing, debugging, monitoring Langfuse, OpenTelemetry, Helicone, Phoenix
Evaluation Quality and safety testing DeepEval, RAGAS, promptfoo, custom eval suites
Guardrails Safety and structured validation Guardrails AI, Pydantic, JSON Schema, policy checkers
UI / App Layer User-facing agent applications Streamlit, Gradio, Next.js, FastAPI
Domain Tools Industry-specific integrations healthcare records, finance data, CRM, ERP, ticketing systems

Core architecture

graph TD
    User[User] --> Supervisor[Supervisor Agent]
    Supervisor --> Planner[Planner]
    Planner --> MemoryAgent[Memory Agent]
    Planner --> ResearchAgent[Research Agent]
    Planner --> ToolAgent[Tool Agent]
    Planner --> DomainAgent[Domain Agent]
    MemoryAgent --> SharedMemory[Shared Memory]
    ToolAgent --> MCP[MCP Servers]
    DomainAgent --> MCP
    ResearchAgent --> MCP
    MCP --> PluginLayer[Plugin Ecosystem]
    PluginLayer --> Databases[Databases]
    PluginLayer --> Documents[Documents]
    PluginLayer --> APIs[External APIs]
    PluginLayer --> SaaS[SaaS Apps]
    Supervisor --> Evaluator[Evaluator Agent]
    Evaluator --> Final[Final Response]
    Final --> User
    Evaluator --> SharedMemory

Repository structure

agent-engineering-roadmap/
├── README.md
├── README_zh.md
├── COURSE.md
├── assets/           # Visual diagrams and teaching images
├── roadmap/          # Level 0-8 learning path
├── curriculum/       # Full course chapters
├── examples/         # Hands-on examples
├── benchmarks/       # Lightweight behavior checks
├── security/         # Prompt injection and agent security labs
├── study-groups/     # Cohort and workshop facilitation kit
├── showcases/        # Shareable demos with sample outputs
├── labs/             # Guided exercises
├── lesson-plans/     # Instructor-ready lesson plans
├── patterns/         # Architecture pattern catalog
├── architecture/     # System design patterns
├── templates/        # Reusable agent and MCP templates
├── assessments/      # Quiz bank and rubrics
├── projects/         # Capstone and portfolio projects
├── glossary/         # Agent engineering terms
├── healthcare/       # Healthcare agent engineering track
├── finance/          # Finance and quantitative research track
├── resources/        # Curated learning resources
├── docs/             # GitHub Pages site
└── launch-kit/       # Launch copy, topics, and checklist

Real-world tracks

Healthcare Agent Engineering

Build agent systems for care management, nutrition tracking, personal health memory, and healthcare workflow automation.

Example colony:

Care Manager Agent
├── Nutrition Agent
├── Vital Sign Agent
├── Psychology Agent
├── Medication Agent
├── Memory Agent
└── Safety Evaluator Agent

Finance Agent Engineering

Build research agents, factor-analysis agents, portfolio agents, risk agents, and trading research workflows.

Example colony:

Research Agent
├── Market Data Agent
├── Factor Analysis Agent
├── Portfolio Agent
├── Risk Agent
└── Report Agent

Enterprise Agent Engineering

Build customer support agents, internal knowledge agents, document agents, workflow automation agents, and evaluation pipelines.


Design principles

  1. Agents should be useful before they are autonomous.
  2. Memory should be intentional, auditable, and safe.
  3. MCP should be treated as an integration layer, not just a plugin mechanism.
  4. Multi-agent systems should reduce complexity for users, not create complexity for developers.
  5. Production agents need evaluation, observability, cost control, and human approval gates.

Project roadmap

  • Initialize bilingual repository structure
  • Add Level 0-8 roadmap skeleton
  • Add architecture documents
  • Add healthcare and finance tracks
  • Add README badges and hero banner
  • Expand each roadmap level into handbook chapters
  • Add minimal runnable examples
  • Add MCP server templates
  • Add memory system examples
  • Add agent colony demo
  • Add evaluation and safety templates
  • Add full course syllabus
  • Add observable agent and prompt injection defense examples
  • Add benchmark runner and study group kit
  • Add cost, durable runtime, and modern MCP gateway modules
  • Add memory governance, identity permission, and incident response modules
  • Add advanced eval, product UX, and enterprise operating model modules
  • Add guided labs
  • Add instructor-ready lesson plans
  • Add pattern catalog
  • Add quiz bank, rubrics, glossary, and capstone
  • Add full healthcare agent colony application
  • Add full finance research agent application

Who this is for

  • AI engineers
  • LLM application developers
  • Startup builders
  • Researchers building agent systems
  • Product teams moving from chatbot demos to real workflows
  • Developers interested in MCP, memory, and multi-agent systems

License

This project is licensed under the MIT License.

Yorumlar (0)

Sonuc bulunamadi