Awesome RL for Agents

A curated list of reinforcement learning (RL) for agents.

This list collects papers, tools, and demos that demonstrate how reinforcement learning can be applied to train or tune LLM/MLLM agents, with a focus on research-driven, computer-using, and tool-integrated agent behaviors. It is not associated with any survey or review — just a personal, living collection of resources on RL for agents. I’ll keep updating it as long as I’m still working in this area.

📚 Papers & Research
🕹️ Benchmarks
🧪 Demos & Projects
🧰 Toolkits & Frameworks
📄 Tutorials & Blog Posts
🔗 Related Awesome Lists
🤝 Contributing

📚 Papers & Research

Survey & Review

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey [Preprint'25] [AwesomeList]
A Survey of LLM-based Deep Search Agents: Paradigm, Optimization, Evaluation, and Challenges [Preprint'25] [AwesomeList]
Deep Research Agents: A Systematic Examination And Roadmap [Preprint'25] [AwesomeList]

RL for Computer-using Agents

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning [Preprint'25] [Code]
OPENCUA: OpenFoundations for Computer-Use Agents [Preprint'25] [Code]
ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay [Preprint'25] [Code]
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners [Preprint'25] [Code]
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning [Preprint'25]
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning [Preprint'25] [Code]
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents [Preprint'25] [Code]
AutoWebGLM: A Large Language Model-based Web Navigating Agent [KDD'24] [Preprint'24] [Code]

RL for Research Agents

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents [Preprint'26] [Code]
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking [Preprint'26]
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction [Preprint'25]
Tree Search for LLM Agent Reinforcement Learning [Preprint'25]
Tongyi DeepResearch: A New Era of Open-Source AI Researchers [Blog] [Code]
SSRL: Self-Search Reinforcement Learning [Preprint'25] [Code]
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL [Preprint'25] [Code]
MiroMind Open Deep Research [Blog] [Code]
ARPO: Agentic Reinforced Policy Optimization [Preprint'25] [Code]
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training [Preprint'25] [Code]
WebShaper: Towards Autonomous Information Seeking Agency [Preprint'25] [Code]
WebSailor: Navigating Super-human Reasoning for Web Agent [Preprint'25] [Code]
Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities [Blog]
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning [Preprint'25] [Code]
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning [Preprint'25] [Code]
ZeroSearch: Incentivize the Search Capability of LLMs without Searching [Preprint'25] [Code]
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [EMNLP'25] [Code]
ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning [Preprint'25] [Code]
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [COLM'25] [Code]
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [Preprint'25] [Code]
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research [Preprint'25] [Code]

RL for Tool-using Problem Solver

Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning [Preprint'25]
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [Preprint'25] [Code]
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection [Preprint'25] [Code]
OTC: Optimal Tool Calls via Reinforcement Learning [Preprint'25]
ToolRL: Reward is All Tool Learning Needs [Preprint'25] [Code]
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs [Preprint'25]
Agent models: Internalizing Chain-of-Action Generation into Reasoning models [Preprint'25] [Code]
TORL: Scaling Tool-Integrated RL [Preprint'25] [Code]

Self-Playing Agent with RL

Toward Training Superintelligent Software Agents through Self-Play SWE-RL [Preprint'25]
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning [Preprint'25] [Code]
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning [Preprint'25] [Code]
Search Self-play: Pushing the Frontier of Agent Capability without Supervision [ICLR'26] [Code]

RL for Agent Memory

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent [Preprint'25] [Code]
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [Preprint'25]

RL for Multi-Modal Agent (Thinking w Image / MMSearch)

Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception [Preprint'26] [Code]
DeepEyesV2: Toward Agentic Multimodal Model [ICLR'26] [Code]
MMSearch-R1: Incentivizing LMMs to Search [Preprint'25] [Code]
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning [ICLR'26] [Code]

RL with Agent Skills

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning [Preprint'26] [Code]

Reinforcement Learning Scaling

The Art of Scaling Reinforcement Learning Compute for LLMs [Preprint'25]
Group Sequence Policy Optimization [Preprint'25]
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning [Preprint'25] [Model]
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce [Preprint'25]
o3 & o4-mini: Introducing OpenAI o3 and o4-mini [Blog]
Skywork-OR1 (Open Reasoner 1) [Blog] [Code]
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks [Preprint'25]
DAPO: An Open-Source LLM Reinforcement Learning System at Scale [Preprint'25] [Code]
LIMR: Less is More for RL Scaling [Preprint'25] [Code]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Preprint'25]
Kimi k1.5: Scaling Reinforcement Learning with LLMs [Preprint'25]

Others

Seed-1.8 [Code]
UFO: A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning [Preprint'25] [Code]
Self-Challenging Language Model Agents [Preprint'25]
MPO: Boosting LLM Agents with Meta Plan Optimization [Preprint'25] [Code]

🕹 Benchmarks

CLI

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces [Preprint'26] [Website]

Deep research

OmniGAIA: Towards Native Omni-Modal AI Agents [Preprint'26] [Code]
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models [Preprint'26] [Code]
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios [Preprint'26] [Code]
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning [Preprint'26] [Code]
Marco Search Agent: Towards Real‑World and Challenging Agentic Search (including HSCodeComp and DeepWideSearch) [Preprint'25(1)] [Preprint'25(2)] [Code]
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning [Preprint'25] [Code]
MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents [Preprint'25] [Code]
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents [ICLR'26] [Code]
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent [Preprint'25] [Huggingface]
xbench: Tracking Agents Productivity Scaling With Profession-Aligned Real-World Evaluations [Preprint'25] [Website]
BrowseComp-ZH: Benchmarking the Web Browsing Ability of Large Language Models in Chinese [Preprint'25] [Code]
BrowseComp: a benchmark for browsing agents [Blog] [Paper] [Code]

Computer Use

Computer Agent Arena: Compare & Test AI Agents on Crowdsourced Real-World Computer Use Tasks [Platform] [Code]
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use [Paper] [Code]
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments [NeurIPS'24] [Code]
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents [ACL'24] [Code]

🧪 Demos & Projects

RL-based LLM agent tuning

Claw-R1: Empowering OpenClaw with Advanced Agentic RL [Page] [Code]
SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning [Blog] [Code]
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning [Code]
VAGEN: Training VLM Agents with Multi-Turn Reinforcement Learning [Code]
OpenManus-RL [Code] & OpenManus [Code]
RAGEN: Training Agents by Reinforcing Reasoning [Code]

RL-based LLM tuning

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model [Preprint'25] [Code]
simple_GRPO [Code]

MCP Agents

mcp-agent [Code]
Agent2Agent (A2A) protocol [Code]

🧰 Toolkits & Frameworks

rLLM: Reinforcement Learning for Language Agents [Code]
slime: An SGLang-Native Post-Training Framework for RL Scaling [Code]
ROLL: Reinforcement Learning Optimization for Large-Scale Learning [Code]
verl: Volcano Engine Reinforcement Learning for LLM [Code]

📄 Tutorials & Blog Posts

Forge: Scalable Agent RL Framework and Algorithm [Blog]
Cut the Bill, Keep the Turns: Affordable Multi-Turn Search RL [Blog]
Introducing ChatGPT agent: bridging research and action [Blog]
Context Engineering [Github]
The Second Half [Blog]

🔗 Related Awesome Lists

Awesome RL-based Agentic Search Papers [List] - covering Agentic RL papers in agentic search systems
Agent-Memory-Paper-List [List] - covering agent memory papers
Awesome-AgenticLLM-RL-Papers [List] - covering Agentic RL papers in both agentic capabilities and applications
Awesome-Search-Agent-Papers [List] - covering search agent papers
Awesome Deep Research Agent [List] - covering deep research agents and benchmark results
Awesome-Agent-RL [List] - covering RL for research agents
awesome-ml-agents [List] - covering rl and agents before 2023

🤝 Contributing

Contributions are warmly welcome!

If you know a paper, tool, environment, or demo relevant to RL for Agents, feel free to open a pull request.

Guidelines:

Make sure the resource is publicly accessible and active.
Use the same format as existing entries: - **Name**: Title [Paper](link) [Code](link) – short description (optional).
Add entries under the most appropriate section.
Avoid duplicates or resources that are already well-covered elsewhere.

We aim to keep this list high-quality, practical, and focused. Thank you for helping improve it! ✨

awesome-rl-for-agents