awesome-rl-for-agents
Health Pass
- License — License: CC0-1.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 97 GitHub stars
Code Warn
- Code scan incomplete — No supported source files were scanned during light audit
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
A curated list of reinforcement learning (RL) for agents.
Awesome RL for Agents 
A curated list of reinforcement learning (RL) for agents.
This list collects papers, tools, and demos that demonstrate how reinforcement learning can be applied to train or tune LLM/MLLM agents, with a focus on research-driven, computer-using, and tool-integrated agent behaviors. It is not associated with any survey or review — just a personal, living collection of resources on RL for agents. I’ll keep updating it as long as I’m still working in this area.
Table of Contents
- 📚 Papers & Research
- 🕹️ Benchmarks
- 🧪 Demos & Projects
- 🧰 Toolkits & Frameworks
- 📄 Tutorials & Blog Posts
- 🔗 Related Awesome Lists
- 🤝 Contributing
📚 Papers & Research
Survey & Review
- The Landscape of Agentic Reinforcement Learning for LLMs: A Survey [Preprint'25] [AwesomeList]
- A Survey of LLM-based Deep Search Agents: Paradigm, Optimization, Evaluation, and Challenges [Preprint'25] [AwesomeList]
- Deep Research Agents: A Systematic Examination And Roadmap [Preprint'25] [AwesomeList]
RL for Computer-using Agents
- UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning [Preprint'25] [Code]
- OPENCUA: OpenFoundations for Computer-Use Agents [Preprint'25] [Code]
- ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay [Preprint'25] [Code]
- InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners [Preprint'25] [Code]
- Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning [Preprint'25]
- UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning [Preprint'25] [Code]
- Digi-Q: Learning Q-Value Functions for Training Device-Control Agents [Preprint'25] [Code]
- AutoWebGLM: A Large Language Model-based Web Navigating Agent [KDD'24] [Preprint'24] [Code]
RL for Research Agents
- REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents [Preprint'26] [Code]
- ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking [Preprint'26]
- IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction [Preprint'25]
- Tree Search for LLM Agent Reinforcement Learning [Preprint'25]
- Tongyi DeepResearch: A New Era of Open-Source AI Researchers [Blog] [Code]
- SSRL: Self-Search Reinforcement Learning [Preprint'25] [Code]
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL [Preprint'25] [Code]
- MiroMind Open Deep Research [Blog] [Code]
- ARPO: Agentic Reinforced Policy Optimization [Preprint'25] [Code]
- Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training [Preprint'25] [Code]
- WebShaper: Towards Autonomous Information Seeking Agency [Preprint'25] [Code]
- WebSailor: Navigating Super-human Reasoning for Web Agent [Preprint'25] [Code]
- Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities [Blog]
- R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning [Preprint'25] [Code]
- R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning [Preprint'25] [Code]
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching [Preprint'25] [Code]
- DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [EMNLP'25] [Code]
- ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning [Preprint'25] [Code]
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [COLM'25] [Code]
- R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [Preprint'25] [Code]
- Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research [Preprint'25] [Code]
RL for Tool-using Problem Solver
- Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning [Preprint'25]
- VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [Preprint'25] [Code]
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection [Preprint'25] [Code]
- OTC: Optimal Tool Calls via Reinforcement Learning [Preprint'25]
- ToolRL: Reward is All Tool Learning Needs [Preprint'25] [Code]
- ReTool: Reinforcement Learning for Strategic Tool Use in LLMs [Preprint'25]
- Agent models: Internalizing Chain-of-Action Generation into Reasoning models [Preprint'25] [Code]
- TORL: Scaling Tool-Integrated RL [Preprint'25] [Code]
Self-Playing Agent with RL
- Toward Training Superintelligent Software Agents through Self-Play SWE-RL [Preprint'25]
- Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning [Preprint'25] [Code]
- Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning [Preprint'25] [Code]
- Search Self-play: Pushing the Frontier of Agent Capability without Supervision [ICLR'26] [Code]
RL for Agent Memory
- MemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent [Preprint'25] [Code]
- MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [Preprint'25]
RL for Multi-Modal Agent (Thinking w Image / MMSearch)
- Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception [Preprint'26] [Code]
- DeepEyesV2: Toward Agentic Multimodal Model [ICLR'26] [Code]
- MMSearch-R1: Incentivizing LMMs to Search [Preprint'25] [Code]
- DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning [ICLR'26] [Code]
RL with Agent Skills
- SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning [Preprint'26] [Code]
Reinforcement Learning Scaling
- The Art of Scaling Reinforcement Learning Compute for LLMs [Preprint'25]
- Group Sequence Policy Optimization [Preprint'25]
- Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning [Preprint'25] [Model]
- A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce [Preprint'25]
- o3 & o4-mini: Introducing OpenAI o3 and o4-mini [Blog]
- Skywork-OR1 (Open Reasoner 1) [Blog] [Code]
- VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks [Preprint'25]
- DAPO: An Open-Source LLM Reinforcement Learning System at Scale [Preprint'25] [Code]
- LIMR: Less is More for RL Scaling [Preprint'25] [Code]
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Preprint'25]
- Kimi k1.5: Scaling Reinforcement Learning with LLMs [Preprint'25]
Others
- Seed-1.8 [Code]
- UFO: A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning [Preprint'25] [Code]
- Self-Challenging Language Model Agents [Preprint'25]
- MPO: Boosting LLM Agents with Meta Plan Optimization [Preprint'25] [Code]
🕹 Benchmarks
CLI
- Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces [Preprint'26] [Website]
Deep research
- OmniGAIA: Towards Native Omni-Modal AI Agents [Preprint'26] [Code]
- Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models [Preprint'26] [Code]
- AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios [Preprint'26] [Code]
- Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning [Preprint'26] [Code]
- Marco Search Agent: Towards Real‑World and Challenging Agentic Search (including HSCodeComp and DeepWideSearch) [Preprint'25(1)] [Preprint'25(2)] [Code]
- FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning [Preprint'25] [Code]
- MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents [Preprint'25] [Code]
- MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents [ICLR'26] [Code]
- BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent [Preprint'25] [Huggingface]
- xbench: Tracking Agents Productivity Scaling With Profession-Aligned Real-World Evaluations [Preprint'25] [Website]
- BrowseComp-ZH: Benchmarking the Web Browsing Ability of Large Language Models in Chinese [Preprint'25] [Code]
- BrowseComp: a benchmark for browsing agents [Blog] [Paper] [Code]
Computer Use
- Computer Agent Arena: Compare & Test AI Agents on Crowdsourced Real-World Computer Use Tasks [Platform] [Code]
- ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use [Paper] [Code]
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments [NeurIPS'24] [Code]
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents [ACL'24] [Code]
🧪 Demos & Projects
RL-based LLM agent tuning
- Claw-R1: Empowering OpenClaw with Advanced Agentic RL [Page] [Code]
- SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning [Blog] [Code]
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning [Code]
- VAGEN: Training VLM Agents with Multi-Turn Reinforcement Learning [Code]
- OpenManus-RL [Code] & OpenManus [Code]
- RAGEN: Training Agents by Reinforcing Reasoning [Code]
RL-based LLM tuning
- Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model [Preprint'25] [Code]
- simple_GRPO [Code]
MCP Agents
🧰 Toolkits & Frameworks
- rLLM: Reinforcement Learning for Language Agents [Code]
- slime: An SGLang-Native Post-Training Framework for RL Scaling [Code]
- ROLL: Reinforcement Learning Optimization for Large-Scale Learning [Code]
- verl: Volcano Engine Reinforcement Learning for LLM [Code]
📄 Tutorials & Blog Posts
- Forge: Scalable Agent RL Framework and Algorithm [Blog]
- Cut the Bill, Keep the Turns: Affordable Multi-Turn Search RL [Blog]
- Introducing ChatGPT agent: bridging research and action [Blog]
- Context Engineering [Github]
- The Second Half [Blog]
🔗 Related Awesome Lists
- Awesome RL-based Agentic Search Papers [List] - covering Agentic RL papers in agentic search systems
- Agent-Memory-Paper-List [List] - covering agent memory papers
- Awesome-AgenticLLM-RL-Papers [List] - covering Agentic RL papers in both agentic capabilities and applications
- Awesome-Search-Agent-Papers [List] - covering search agent papers
- Awesome Deep Research Agent [List] - covering deep research agents and benchmark results
- Awesome-Agent-RL [List] - covering RL for research agents
- awesome-ml-agents [List] - covering rl and agents before 2023
🤝 Contributing
Contributions are warmly welcome!
If you know a paper, tool, environment, or demo relevant to RL for Agents, feel free to open a pull request.
Guidelines:
- Make sure the resource is publicly accessible and active.
- Use the same format as existing entries:
- **Name**: Title [Paper](link) [Code](link) – short description (optional). - Add entries under the most appropriate section.
- Avoid duplicates or resources that are already well-covered elsewhere.
We aim to keep this list high-quality, practical, and focused. Thank you for helping improve it! ✨
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found