awesome-rl-for-agents

agent
Guvenlik Denetimi
Uyari
Health Gecti
  • License — License: CC0-1.0
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 97 GitHub stars
Code Uyari
  • Code scan incomplete — No supported source files were scanned during light audit
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

A curated list of reinforcement learning (RL) for agents.

README.md

Awesome RL for Agents Awesome

A curated list of reinforcement learning (RL) for agents.

This list collects papers, tools, and demos that demonstrate how reinforcement learning can be applied to train or tune LLM/MLLM agents, with a focus on research-driven, computer-using, and tool-integrated agent behaviors. It is not associated with any survey or review — just a personal, living collection of resources on RL for agents. I’ll keep updating it as long as I’m still working in this area.


Table of Contents


📚 Papers & Research

Survey & Review

RL for Computer-using Agents

  • UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning [Preprint'25] [Code]
  • OPENCUA: OpenFoundations for Computer-Use Agents [Preprint'25] [Code]
  • ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay [Preprint'25] [Code]
  • InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners [Preprint'25] [Code]
  • Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning [Preprint'25]
  • UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning [Preprint'25] [Code]
  • Digi-Q: Learning Q-Value Functions for Training Device-Control Agents [Preprint'25] [Code]
  • AutoWebGLM: A Large Language Model-based Web Navigating Agent [KDD'24] [Preprint'24] [Code]

RL for Research Agents

  • REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents [Preprint'26] [Code]
  • ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking [Preprint'26]
  • IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction [Preprint'25]
  • Tree Search for LLM Agent Reinforcement Learning [Preprint'25]
  • Tongyi DeepResearch: A New Era of Open-Source AI Researchers [Blog] [Code]
  • SSRL: Self-Search Reinforcement Learning [Preprint'25] [Code]
  • Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL [Preprint'25] [Code]
  • MiroMind Open Deep Research [Blog] [Code]
  • ARPO: Agentic Reinforced Policy Optimization [Preprint'25] [Code]
  • Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training [Preprint'25] [Code]
  • WebShaper: Towards Autonomous Information Seeking Agency [Preprint'25] [Code]
  • WebSailor: Navigating Super-human Reasoning for Web Agent [Preprint'25] [Code]
  • Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities [Blog]
  • R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning [Preprint'25] [Code]
  • R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning [Preprint'25] [Code]
  • ZeroSearch: Incentivize the Search Capability of LLMs without Searching [Preprint'25] [Code]
  • DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [EMNLP'25] [Code]
  • ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning [Preprint'25] [Code]
  • Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [COLM'25] [Code]
  • R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [Preprint'25] [Code]
  • Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research [Preprint'25] [Code]

RL for Tool-using Problem Solver

Self-Playing Agent with RL

  • Toward Training Superintelligent Software Agents through Self-Play SWE-RL [Preprint'25]
  • Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning [Preprint'25] [Code]
  • Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning [Preprint'25] [Code]
  • Search Self-play: Pushing the Frontier of Agent Capability without Supervision [ICLR'26] [Code]

RL for Agent Memory

  • MemAgent: Reshaping Long-Context LLM with Multi-Conv RL based Memory Agent [Preprint'25] [Code]
  • MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [Preprint'25]

RL for Multi-Modal Agent (Thinking w Image / MMSearch)

RL with Agent Skills

  • SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning [Preprint'26] [Code]

Reinforcement Learning Scaling

  • The Art of Scaling Reinforcement Learning Compute for LLMs [Preprint'25]
  • Group Sequence Policy Optimization [Preprint'25]
  • Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning [Preprint'25] [Model]
  • A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce [Preprint'25]
  • o3 & o4-mini: Introducing OpenAI o3 and o4-mini [Blog]
  • Skywork-OR1 (Open Reasoner 1) [Blog] [Code]
  • VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks [Preprint'25]
  • DAPO: An Open-Source LLM Reinforcement Learning System at Scale [Preprint'25] [Code]
  • LIMR: Less is More for RL Scaling [Preprint'25] [Code]
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Preprint'25]
  • Kimi k1.5: Scaling Reinforcement Learning with LLMs [Preprint'25]

Others

🕹 Benchmarks

CLI

  • Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces [Preprint'26] [Website]

Deep research

  • OmniGAIA: Towards Native Omni-Modal AI Agents [Preprint'26] [Code]
  • Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models [Preprint'26] [Code]
  • AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios [Preprint'26] [Code]
  • Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning [Preprint'26] [Code]
  • Marco Search Agent: Towards Real‑World and Challenging Agentic Search (including HSCodeComp and DeepWideSearch) [Preprint'25(1)] [Preprint'25(2)] [Code]
  • FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning [Preprint'25] [Code]
  • MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents [Preprint'25] [Code]
  • MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents [ICLR'26] [Code]
  • BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent [Preprint'25] [Huggingface]
  • xbench: Tracking Agents Productivity Scaling With Profession-Aligned Real-World Evaluations [Preprint'25] [Website]
  • BrowseComp-ZH: Benchmarking the Web Browsing Ability of Large Language Models in Chinese [Preprint'25] [Code]
  • BrowseComp: a benchmark for browsing agents [Blog] [Paper] [Code]

Computer Use

  • Computer Agent Arena: Compare & Test AI Agents on Crowdsourced Real-World Computer Use Tasks [Platform] [Code]
  • ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use [Paper] [Code]
  • OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments [NeurIPS'24] [Code]
  • SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents [ACL'24] [Code]

🧪 Demos & Projects

RL-based LLM agent tuning

  • Claw-R1: Empowering OpenClaw with Advanced Agentic RL [Page] [Code]
  • SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning [Blog] [Code]
  • Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning [Code]
  • VAGEN: Training VLM Agents with Multi-Turn Reinforcement Learning [Code]
  • OpenManus-RL [Code] & OpenManus [Code]
  • RAGEN: Training Agents by Reinforcing Reasoning [Code]

RL-based LLM tuning

  • Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model [Preprint'25] [Code]
  • simple_GRPO [Code]

MCP Agents

🧰 Toolkits & Frameworks

  • rLLM: Reinforcement Learning for Language Agents [Code]
  • slime: An SGLang-Native Post-Training Framework for RL Scaling [Code]
  • ROLL: Reinforcement Learning Optimization for Large-Scale Learning [Code]
  • verl: Volcano Engine Reinforcement Learning for LLM [Code]

📄 Tutorials & Blog Posts

  • Forge: Scalable Agent RL Framework and Algorithm [Blog]
  • Cut the Bill, Keep the Turns: Affordable Multi-Turn Search RL [Blog]
  • Introducing ChatGPT agent: bridging research and action [Blog]
  • Context Engineering [Github]
  • The Second Half [Blog]

🔗 Related Awesome Lists

  • Awesome RL-based Agentic Search Papers [List] - covering Agentic RL papers in agentic search systems
  • Agent-Memory-Paper-List [List] - covering agent memory papers
  • Awesome-AgenticLLM-RL-Papers [List] - covering Agentic RL papers in both agentic capabilities and applications
  • Awesome-Search-Agent-Papers [List] - covering search agent papers
  • Awesome Deep Research Agent [List] - covering deep research agents and benchmark results
  • Awesome-Agent-RL [List] - covering RL for research agents
  • awesome-ml-agents [List] - covering rl and agents before 2023

🤝 Contributing

Contributions are warmly welcome!

If you know a paper, tool, environment, or demo relevant to RL for Agents, feel free to open a pull request.

Guidelines:

  • Make sure the resource is publicly accessible and active.
  • Use the same format as existing entries: - **Name**: Title [Paper](link) [Code](link) – short description (optional).
  • Add entries under the most appropriate section.
  • Avoid duplicates or resources that are already well-covered elsewhere.

We aim to keep this list high-quality, practical, and focused. Thank you for helping improve it! ✨

Yorumlar (0)

Sonuc bulunamadi