🤖 Learn Agent Development from Scratch

A systematic, comprehensive, and practice-oriented AI Agent development guide

📖 English · 📖 中文版 · 🐛 Report Issues · 💬 Discussions

🗺️ Learning Roadmap

🍌 From Basic Concepts → Agent Architecture → Tool Calling → Memory Management → Multi-Agent → Reinforcement Learning → Production Deploy → Goal Achieved!

Follow the banana guide 🍌 step by step, and you'll master AI Agent development from zero to hero!

📖 Read Online (Recommended)

Language	Link
🇨🇳 简体中文	https://Haozhe-Xing.github.io/agent_learning/zh/
🇺🇸 English	https://Haozhe-Xing.github.io/agent_learning/en/

🚀 Auto-Tracking Frontier: Daily arXiv Paper Updates

🤖 This repository automatically searches arXiv for the latest AI Agent-related papers every day and updates the content accordingly — ensuring you always stay at the cutting edge of research!

📡 Daily Automated Search: A scheduled pipeline scans arXiv daily for new papers on Agent architectures, tool use, memory systems, multi-agent collaboration, reinforcement learning for agents, and more.
📝 Auto-Updated Content: Relevant findings are automatically integrated into the corresponding chapters, keeping the book's frontier sections fresh and up-to-date.
🔔 Never Miss a Breakthrough: No need to manually track dozens of research feeds — this repo does it for you, so you can focus on learning and building.

💡 This means the content you read here is not static — it evolves continuously with the latest advances in the AI Agent field.

🔭 Frontier Research Directions

This book not only covers foundational knowledge, but also tracks the cutting-edge research frontiers across each domain. Here are the key directions we follow:

💡 Entries marked with 🔥 are 2025–2026 hottest research topics — all covered in depth in this book!

✨ Key Features

🎯 Step by Step: From LLM fundamentals to multi-Agent systems, each chapter has a clear knowledge progression
💻 Code First: Every core concept comes with runnable Python code examples
🎨 Rich Illustrations: 120+ hand-drawn SVG architecture diagrams / flowcharts / sequence diagrams for intuitive understanding
🎬 Interactive Animations: 5 built-in interactive HTML animations (Perceive-Think-Act cycle, ReAct reasoning, Function Calling, RAG flow, GRPO sampling)
🔬 Paper Reviews: Key chapters include frontier paper deep-dives (ReAct, Reflexion, MemGPT, GRPO, etc.)
🏗️ Complete Projects: 3 comprehensive hands-on projects (AI Coding Assistant, Intelligent Data Analysis Agent, Multimodal Agent)
🛡️ Production Ready: Covers security, evaluation, deployment, and other production essentials
🧪 Cutting Edge: Covers Context Engineering, Agentic-RL (GRPO/DPO/PPO), MCP/A2A/ANP, and other 2025–2026 latest advances
📐 Formula Support: KaTeX-rendered math formulas for clear reading of policy gradient, KL divergence derivations in RL chapters
🔄 Continuously Updated: Tracking the latest changes in LangChain, LangGraph, MCP, and other frameworks

📸 Selected Content Preview

Below are selected showcases from the book's 120+ hand-drawn SVG illustrations, all original to this book.

🧠 Agent Core Architecture

Perceive-Think-Act Loop (Chapter 1)

_{Agent's core mechanism: Perceive environment → LLM reasoning → Execute action → Loop until goal achieved}

ReAct Reasoning Framework (Chapter 6)

_{Thought → Action → Observation alternating loop, enabling Agents to think while acting}

🛠️ Tool Calling & RAG

Function Calling Complete Flow (Chapter 4)

_{6-step complete flow from user input to tool invocation to final response, with message structure illustration}

RAG Retrieval-Augmented Generation (Chapter 7)

_{Offline indexing + Online retrieval dual-phase architecture, making LLM answers evidence-based}

💾 Memory System & Context Engineering

Three-Layer Memory Architecture (Chapter 5)

_{Working memory → Short-term memory → Long-term memory, with important info sinking down and semantic retrieval pulling up}

Prompt Engineering vs Context Engineering (Chapter 8)

_{From "how to say it" to "what the LLM sees" — the paradigm shift of the Agent era}

🤝 Multi-Agent & Communication Protocols

Three Multi-Agent Communication Patterns (Chapter 14)

_{Message Queue (async decoupling) / Shared Blackboard (data sharing) / Direct Call (real-time collaboration)}

MCP / A2A / ANP Protocol Comparison (Chapter 15)

_{Three-layer protocol stack: ANP for discovery → A2A for task collaboration → MCP for tool invocation}

🧪 Reinforcement Learning & Frameworks

GRPO Training Architecture (Chapter 10)

_{No Critic model needed, computes advantage via intra-group normalization, only 1.5× model size in VRAM}

LangGraph Three Core Concepts (Chapter 12)

_{State (shared state) · Node (processing unit) · Edge (execution flow control)}

📖 The above is just a selected preview — For the full 120+ architecture diagrams + 5 interactive animations, please read online

🎬 Interactive Animations

This book includes 5 interactive HTML animations to help you intuitively understand the dynamic processes of core concepts:

Animation	Chapter	Description
🔄Perceive-Think-Act Cycle	Chapter 1	Dynamic demonstration of Agent's core loop
💡ReAct Reasoning Process	Chapter 6	Shows the alternating Thought → Action → Observation process
🔧Function Calling	Chapter 4	Complete tool invocation flow animation
📚RAG Retrieval Flow	Chapter 7	From document chunking to vector retrieval to answer generation
🎯GRPO Sampling Process	Chapter 10	Visualization of intra-group multi-output sampling and reward normalization

💡 Interactive animations are only available in the online e-book. Local builds can also preview them.

🚀 Quick Start

Local Build

Install Dependencies:

# Install mdBook (choose one)
cargo install mdbook
# Or macOS: brew install mdbook

# Install mdbook-katex plugin (for math formula rendering)
cargo install mdbook-katex

# Clone the repository
git clone https://github.com/Haozhe-Xing/agent_learning.git
cd agent_learning

One-click Local Preview (Recommended):

# Build both Chinese and English versions and start unified server (default port 3000)
./serve.sh

# Specify custom port
./serve.sh 8080

# Enable file watching, auto-rebuild on source file changes (requires fswatch or inotifywait)
./serve.sh --watch
./serve.sh 8080 --watch

After starting, visit:

🌐 Language Selection Home: http://localhost:3000 (auto-redirects based on browser language)
🇨🇳 Chinese Version: http://localhost:3000/zh/
🇺🇸 English Version: http://localhost:3000/en/

💡 File watching dependency installation:
# macOS
brew install fswatch

# Ubuntu / Debian
sudo apt-get install inotify-tools

Environment Setup (For Code Practice)

# Python 3.11+
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install core dependencies
pip install langchain langchain-openai langgraph openai anthropic

# Configure API Key
export OPENAI_API_KEY="your-key-here"

🔥 Core Topics at a Glance

🧠 Agent Core Architecture

Perceive → Think → Act Loop
ReAct Reasoning Framework
Task Decomposition & Planning
Reflection & Self-Correction

🛠️ Tools & Skills

Function Calling Mechanism
Custom Tool Design
Skill System Construction
Tool Description Best Practices

🧪 Reinforcement Learning Training

SFT + LoRA Basic Training
PPO / DPO / GRPO Algorithm Deep-Dive
Complete Training Pipeline Hands-on
2025–2026 Latest Research Advances

💾 Memory, Knowledge & Context

Short-term / Long-term / Working Memory
Vector Databases (Chroma / FAISS)
RAG Retrieval-Augmented Generation
Context Engineering & Attention Budget

🤝 Multi-Agent Collaboration & Communication

MCP / A2A / ANP Protocol Stack
Supervisor vs Decentralized Patterns
CrewAI / AutoGen Frameworks
LangGraph Stateful Agents

🛡️ Production Full Pipeline

Evaluation Benchmarks (GAIA / SWE-bench)
Security Defense & Sandbox Isolation
Containerized Deployment & Streaming
Observability & Cost Optimization

📊 Technology Stack

🤝 Contributing

All forms of contribution are welcome!

🐛 Found a bug: Submit an Issue
💡 Content suggestions: Start a Discussion
📝 Improve content: Fork → Edit → Submit PR
⭐ Support the project: Give this repo a Star!

Contributing Guide

# Fork and clone
git clone https://github.com/YOUR_USERNAME/agent_learning.git  # Replace with your username

# Create a feature branch
git checkout -b feature/improve-chapter-4

# Local preview (unified Chinese & English service)
./serve.sh

# Commit changes
git commit -m "feat: improve Chapter 4 tool calling code examples"

# Push and create PR
git push origin feature/improve-chapter-4

Content Organization Conventions

Each chapter is placed in a separate directory src/zh/chapter_xxx/ (Chinese) or src/en/chapter_xxx/ (English)
Chapter overview goes in README.md, sections are numbered as 01_xxx.md, 02_xxx.md
Chinese SVG illustrations go in src/zh/svg/, English versions in src/en/svg/, naming format: chapter_xxx_description.svg
Chinese interactive animations go in src/zh/animations/, English versions in src/en/animations/

English Translation Contributions

The English version is being continuously translated. Translation contributions are welcome!

Steps to translate a chapter:

Find the corresponding .md file under src/en/ (content shows placeholder 🚧 Translation in progress)
Translate the Chinese version from src/zh/ and replace the placeholder content
If the chapter references SVG images, create corresponding English SVGs in src/en/svg/ (replace Chinese text with English)
If the chapter references interactive animations, create corresponding English HTML in src/en/animations/
Preview locally with ./serve.sh, visit http://localhost:3000/en/ to check the English version
Submit PR with title format: translate: Translate Chapter X - [Chapter Name]

Placeholder template format (English file content before translation):

# [Chapter Title]

> 🚧 **Translation in progress.**
> This chapter is not yet available in English.
> Please check back later, or switch to the [Chinese version](../../zh/...) for the full content.

📄 License

This project is open-sourced under the MIT License.

⭐ Star History

If this project helps you, please give it a Star ⭐ — it's the greatest encouragement for the author!

Built with ❤️, so that every developer can master AI Agent development

⬆ Back to Top