Awesome Proactive Agents dynamic banner

Awesome Proactive Agents

Proactive Agent banner

A curated research map for proactive agents: AI systems that infer latent user needs, decide when to intervene, ask for missing context or consent, and initiate useful assistance before a complete explicit command.

If this list is useful, a ⭐ helps others find it.

Scope
Must Read
Papers
Benchmarks
Research Map
Benchmark Matrix
Tag Vocabulary
Contributing

Scope

This list prioritizes papers where proactivity is a central research target. The list is broader than computer-use agents: it includes proactive dialogue, planning, recommendation, wearable assistance, GUI/mobile/OS agents, programming assistants, personalization, memory, benchmarks, optimization, and human factors.

Typical inclusion signals:

The agent predicts latent intent or missing context before a complete user instruction.
The agent decides when to ask, suggest, remind, intervene, execute, or stay silent.
The paper evaluates proactive behavior, intervention timing, user control, consent, interruption cost, or personalization.
The benchmark or dataset makes proactivity the primary task rather than a side effect of general tool use.

Resource labels:

Paper: arXiv, ACL Anthology, DOI, OpenReview, ACM, Springer, or official proceedings page.
Website: project page, conference page, lab page, or documentation.
Code / Dataset: GitHub, released code, released benchmark, or released dataset.
Notes: short English decision card with why the paper matters, proactivity signal, evaluation setup, limitations, and use cases.

Must Read

Selected starting points for understanding the field.

Date	Paper	Why read it first
2024-04	Towards Human-centered Proactive Conversational Agents	Establishes the human-centered dimensions of proactive agents: intelligence, adaptivity, and civility.
2024-10	Proactive Agent	Canonical shift from reactive LLM agents to active assistance over event streams; introduces ProactiveBench.
2024-10	Need Help?	Strong user-study reference for proactive IDE assistance and intervention timing.
2025-05	ContextAgent	Extends proactive agents to open-world sensory contexts and tool calling.
2026-02	ProAgentBench	Real workflow logs reveal why synthetic proactive data can overestimate performance.
2026-04	KnowU-Bench	Closest benchmark to proactive, personalized, consent-aware mobile assistants.
2026-05	π-Bench	Sharp long-horizon benchmark for hidden-intent resolution in personal assistant workflows.

Papers

Foundations, Surveys and Human Factors

Date	Title	Venue / Source	Tags
2024-04	Towards Human-centered Proactive Conversational Agents	SIGIR 2024	`Definition` · `Human Factors` · `Dialogue`
2024-10	Redefining Proactivity for Information Seeking Dialogue	SICON 2024	`Definition` · `Dialogue` · `Intent Inference`
2025-01	When AI-Based Agents Are Proactive: Implications for Competence and System Satisfaction in Human-AI Collaboration	BISE 2026	`Human Factors` · `Intervention Timing` · `Trust`
2025-02	Assistance or Disruption? Exploring and Evaluating the Design and Trade-offs of Proactive AI Programming Support	CHI 2025	`Human Factors` · `Intervention Timing` · `IDE`
2025-03	Proactive Conversational AI: A Comprehensive Survey of Advancements and Opportunities	ACM TOIS 2025	`Survey` · `Definition` · `Dialogue`
2026-01	Developer Interaction Patterns with Proactive AI: A Five-Day Field Study	arXiv 2601	`Human Factors` · `Real-world Data` · `IDE`
2026-02	From Fragmentation to Integration: Exploring the Design Space of AI Agents for Human-as-the-Unit Privacy Management	arXiv 2602	`Safety & Consent` · `Privacy` · `Human Factors`
2026-02	Exploring The Impact of Proactive Generative AI Agent Roles in Time-Sensitive Collaborative Problem-Solving Tasks	arXiv 2602	`Human Factors` · `Collaboration` · `Intervention Timing`

Proactive Interaction and Planning

Date	Title	Venue / Source	Tags
2024-03	ProMISe: A Proactive Multi-turn Dialogue Dataset for Information-seeking Intent Resolution	Findings of EACL 2024	`Clarification` · `Dialogue` · `Benchmark`
2024-06	Ask-before-Plan: Proactive Language Agents for Real-World Planning	arXiv 2406	`Clarification` · `Planning` · `Intent Inference`
2024-10	Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance	ICLR 2025	`Intent Inference` · `Benchmark` · `Desktop`
2025-01	Proactive Conversational Agents with Inner Thoughts	CHI 2025	`Dialogue` · `Intent Inference` · `Intervention Timing`
2025-01	ProTOD: Proactive Task-oriented Dialogue System Based on LLMs	COLING 2025	`Dialogue` · `Planning` · `Tool Use`
2025-07	Tunable LLM-based Proactive Recommendation Agent	ACL 2025	`Recommendation` · `Personalization` · `Intent Inference`
2025-09	PRINCIPLES: Synthetic Strategy Memory for Proactive Dialogue Agents	Findings of EMNLP 2025	`Dialogue` · `Memory` · `Simulation`
2025-10	ProMediate: A Socio-cognitive Framework for Evaluating Proactive Agents in Multi-party Negotiation	arXiv 2510	`Dialogue` · `Collaboration` · `Benchmark`
2026-01	Proactivity-driven Personalized Agents for Advancing Human Learning through Engagement, Reflection, and Self-Efficacy	arXiv 2601	`Personalization` · `Intent Inference` · `Education`
2026-01	Long-term Task-oriented Agent: Proactive Long-term Intent Maintenance in Dynamic Environments	arXiv 2601	`Long-horizon` · `Intent Inference` · `Benchmark`
2026-05	Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents	arXiv 2605	`Intent Inference` · `Memory` · `Benchmark`

GUI, Mobile, OS and Coding Agents

Date	Title	Venue / Source	Tags
2024-10	Need Help? Designing Proactive AI Assistants for Programming	CHI 2025	`IDE` · `Intervention Timing` · `Human Factors`
2025-03	CodingGenie: A Proactive LLM-Powered Programming Assistant	arXiv 2503	`IDE` · `Intent Inference` · `Tool Use`
2025-07	FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents	ICLR 2026	`Mobile` · `Personalization` · `Benchmark`
2025-08	AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance	CIKM 2025	`GUI` · `Intent Inference` · `Tool Use`
2025-09	VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents	arXiv 2509	`OS` · `Safety & Consent` · `Clarification`
2026-02	ProAgentBench: Evaluating LLM Agents for Proactive Assistance with Real-World Data	arXiv 2602	`Real-world Data` · `Intervention Timing` · `Benchmark`
2026-02	ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices	arXiv 2602	`Mobile` · `Intent Inference` · `Benchmark`
2026-03	PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents	arXiv 2603	`GUI` · `Intent Inference` · `Benchmark`
2026-04	Help Without Being Asked: A Deployed Proactive Agent System for On-Call Support with Continuous Self-Improvement	arXiv 2604	`Real-world Data` · `Intervention Timing` · `Skill Learning`
2026-04	Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants	arXiv 2604	`Simulation` · `Intervention Timing` · `Benchmark`
2026-04	KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation	arXiv 2604	`Mobile` · `Personalization` · `Safety & Consent`
2026-05	An Empirical Study of Proactive Coding Assistants in Real-World Software Development	arXiv 2605	`IDE` · `Real-world Data` · `Benchmark`
2026-05	ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents	arXiv 2605	`GUI` · `Tool Use` · `Optimization`
2026-04	From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench	Interspeech 2026	`Multimodal / Wearable` · `Intervention Timing` · `Benchmark`
2026-06	Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents	arXiv 2606	`Mobile` · `Intervention Timing` · `Tool Use`

Multimodal, Wearable and Embodied Agents

Date	Title	Venue / Source	Tags
2024-09	AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environments	arXiv 2409	`Embodied` · `Collaboration` · `Planning`
2025-01	YETI: Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks	arXiv 2501	`Multimodal / Wearable` · `Intervention Timing` · `Human Factors`
2025-01	AiGet: Transforming Everyday Moments into Hidden Knowledge Discovery with AI Assistance on Smart Glasses	CHI 2025	`Multimodal / Wearable` · `Intent Inference` · `Personalization`
2025-02	Mirai: A Wearable Proactive AI Inner-Voice for Contextual Nudging	CHI EA 2025	`Multimodal / Wearable` · `Intervention Timing` · `Human Factors`
2025-05	ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions	NeurIPS 2025	`Multimodal / Wearable` · `Personalization` · `Tool Use`
2025-06	Proactive Assistant Dialogue Generation from Streaming Egocentric Videos	EMNLP 2025	`Multimodal / Wearable` · `Dialogue` · `Intervention Timing`
2025-12	ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems	arXiv 2512	`Multimodal / Wearable` · `Sensing` · `Intervention Timing`
2026-03	ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models	ICLR 2026	`Multimodal / Wearable` · `Intervention Timing` · `Benchmark`
2026-05	IPIBench: Evaluating Interactive Proactive Intelligence of MLLMs under Continuous Streams	arXiv 2605	`Multimodal / Wearable` · `Intervention Timing` · `Benchmark`
2026-05	MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory	arXiv 2605	`Memory` · `Multimodal / Wearable` · `Benchmark`

Benchmarks, Personalization and Optimization

Date	Title	Venue / Source	Tags
2025-08	ProactiveEval: A Unified Evaluation Framework for Proactive Dialogue Agents	arXiv 2508	`Benchmark` · `Dialogue` · `Intent Inference`
2025-09	ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation	ICLR 2026	`Personalization` · `Simulation` · `Benchmark`
2025-10	Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents	arXiv 2510	`Benchmark` · `Intent Inference` · `Tool Use`
2025-11	Training Proactive and Personalized LLM Agents	arXiv 2511	`Personalization` · `Optimization` · `Simulation`
2026-02	Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization	arXiv 2602	`Optimization` · `Human Factors` · `Safety & Consent`
2026-03	ProEvent: An Event-centric Benchmark for Proactive Agents	OpenReview / ACL ARR 2026	`Benchmark` · `Long-horizon` · `Intervention Timing`
2026-04	SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization	arXiv 2604	`Optimization` · `Skill Learning` · `Memory`
2026-05	CogniFold: Always-On Proactive Memory via Cognitive Folding	arXiv 2605	`Memory` · `Intent Inference` · `Benchmark`
2026-05	MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation	arXiv 2605	`Optimization` · `Skill Learning` · `Memory`
2026-05	π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows	arXiv 2605	`Long-horizon` · `Intent Inference` · `Benchmark`
2026-05	VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions	arXiv 2605	`Long-horizon` · `Personalization` · `Memory`
2026-06	Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues	arXiv 2606	`Dialogue` · `Personalization` · `Benchmark`
2026-06	Communication Policy Evolution for Proactive LLM Agents	arXiv 2606	`Dialogue` · `Intervention Timing` · `Optimization`

Benchmarks

For detailed comparison, see BENCHMARKS.md.

Date	Benchmark	Paper	Environment	What it tests
2024-03	ProMISe	ProMISe	information-seeking dialogue	proactive clarification for intent resolution
2024-10	RealHumanEval	Need Help?	programming tasks	proactive IDE assistance with human users
2024-10	ProactiveBench	Proactive Agent	desktop activity events	proactive task prediction and acceptance
2025-05	ContextAgentBench	ContextAgent	wearable sensory contexts	proactive service prediction and tool calling
2025-07	FingerTip 20K	FingerTip 20K	Android trajectories	proactive task suggestion and personalized execution
2025-08	ProactiveEval	ProactiveEval	proactive dialogue	target planning and dialogue guidance across six domains
2025-10	PROBE	Beyond Reactivity	web problem-solving tasks	bottleneck discovery and autonomous resolution
2025-11	UserVille	Training Proactive and Personalized LLM Agents	SWE and deep-research user simulation	productivity, proactivity and personalization under vague prompts
2026-01	ChronosBench	Long-term Task-oriented Agent	dynamic task environments	proactive long-term intent maintenance
2026-02	ProAgentBench	ProAgentBench	real workflow logs	when-to-assist and how-to-assist
2026-02	ProactiveMobile	ProactiveMobile	mobile device context	latent intent to executable API sequence
2026-03	ProEvent	ProEvent	future event tracking	proactive event maintenance and reminders
2026-03	PIRA-Bench	PIRA-Bench	continuous GUI screenshots	proactive GUI intent recommendation
2026-03	ProactiveBench (MLLM)	ProactiveBench / Trento	visual difficulty scenarios	MLLM proactive help-seeking from visual context
2026-05	IPIBench	IPIBench	streaming video, multi-turn	interactive proactive monitoring, task management, reactive-proactive coordination
2026-04	ProVoice-Bench	ProVoice-Bench	voice interaction streams	proactive voice intervention timing, over-triggering, monitoring
2026-04	Pare-Bench	Pare	multi-app FSM environment	active user simulation, intervention timing, multi-app execution
2026-04	KnowU-Bench	KnowU-Bench	Android emulator	personalization, proactive tasks, consent and rejection handling
2026-05	CogEval-Bench	CogniFold	streaming event memory	proactive concept emergence and cognitive-structure formation
2026-05	MemEye	MemEye	multimodal long-term memory	visual evidence granularity and temporal state reasoning
2026-05	ProCodeBench	Proactive Coding Assistants	real IDE traces	proactive coding intent prediction and sim-to-real evaluation
2026-05	π-Bench	π-Bench	persistent personal workspaces	proactive hidden-intent resolution and checklist completion in long-horizon workflows
2026-05	ProActEval	Anticipate and Learn	proactive assistant scenarios	idle-time anticipation, evidence acquisition, user effort and hallucination reduction
2026-05	VitaBench 2.0	VitaBench 2.0	long-term user interaction sequences	preference extraction, memory use, updates, and proactive missing-information acquisition
2026-06	Ψ-Bench	Ψ-Bench	persuasive dialogue	persona-sensitive influencing with simulated clients and user profiles

Tag Vocabulary

Tags are intentionally compact and reusable. They describe the paper's main contribution, not every detail.

Tag	Meaning
`Definition`	Defines or reframes proactive agents, proactive dialogue, or design-space boundaries.
`Survey`	Synthesizes a broad proactive-agent subfield or taxonomy.
`Human Factors`	Studies interruption, control, satisfaction, workload, adoption, or developer experience.
`Trust`	Focuses on competence perception, calibrated reliance, or trustworthy interaction.
`Safety & Consent`	Covers confirmation, autonomy boundaries, reversibility, rejection, or risk control.
`Privacy`	Centers privacy management, data minimization, or personal-context governance.
`Intervention Timing`	Focuses on when an agent should act, ask, suggest, or remain silent.
`Intent Inference`	Infers latent goals, hidden constraints, future tasks, or missing information.
`Clarification`	Proactively asks questions before planning, execution, or recommendation.
`Dialogue`	Proactive behavior in conversational, persuasive, or task-oriented interaction.
`Planning`	Proactive decomposition, task planning, scheduling, or future-state reasoning.
`Tool Use`	Tool calling, API execution, GUI operation, or action orchestration.
`Recommendation`	Proactive recommendation or suggestion ranking.
`Collaboration`	Multi-party or human-agent collaborative problem solving.
`Education`	Learning, tutoring, reflection, or student engagement contexts.
`Long-horizon`	Multi-session, dynamic, future-event, or long-running task maintenance.
`Personalization`	User preferences, personas, profiles, long-term user history, or user-specific adaptation.
`Memory`	Persistent memory, episodic memory, visual memory, skill memory, or cognitive memory structures.
`Simulation`	User simulation, environment simulation, synthetic users, or synthetic workflows.
`Optimization`	RL, reward modeling, multi-objective optimization, self-evolution, or behavior tuning.
`Skill Learning`	Skill creation, skill internalization, skill memory, or reusable procedure learning.
`Benchmark`	Introduces a dataset, evaluation suite, benchmark, simulator, or diagnostic protocol.
`Real-world Data`	Uses real user traces, field-study data, or deployment-like logs.
`Desktop`	Desktop activity streams, workstation context, or event logs.
`GUI`	Graphical interface agents, browser/app screens, or visual UI interaction.
`Mobile`	Mobile GUI, Android/iOS workflows, phone sensors, or mobile user context.
`OS`	Operating-system agents, cross-app workflows, or OS-level verification.
`IDE`	Programming assistants, code editors, or developer tooling.
`Multimodal / Wearable`	Video, audio, AR, smart glasses, egocentric streams, or open-world sensory context.
`Sensing`	Active context acquisition, sensor selection, or on-demand sensory capture.
`Embodied`	Robots, physical environments, or human-populated embodied settings.

Contributing

Pull requests are welcome.

Before adding a paper, check that it satisfies at least one of:

It predicts latent user intent before a complete explicit instruction.
It decides when to intervene, ask, suggest, execute, remind, or stay silent.
It evaluates proactive assistance, interruption cost, user control, consent, or personalization.
It contributes a benchmark or dataset where proactivity is the primary task.

Suggested note template:

# Paper Title

## Why It Matters

...

## Proactivity Signal

...

## Evaluation Setup

...

## Key Limitations

...

## Use For

...

Maintained by Low Entropy AI.

awesome-proactive-agent