ProActor: Timing-Aware Reinforcement Learning for Proactive Task Scheduling Agents

Lei Ding, Bin He, Chenguang Wang, Yang Liu


Abstract
Proactive task-oriented agents must autonomously anticipate user needs, identify actionable opportunities, and trigger software actions at appropriate moments—fundamentally shifting from reactive systems that await explicit instructions. However, existing approaches lack generalizable end-to-end solutions for measuring and optimizing such anticipatory behaviors.This paper introduces ProActor, a unified framework for conversational task scheduling that integrates: (1) a domain-agnostic automated annotation methodology that enables scalable proactiveness reinforcement learning (RL) by generating full opportunity time windows instead of rigid point labels, (2) systematic proactiveness metrics capturing both timing quality and reference action alignment, and (3) RL optimization using GRPO with various reward designs. Our insight is that RULER-based rewards with proactiveness rubrics are crucial for improving timing quality, and that proactiveness optimization enabled by stage-aware composite rewards is key to balancing timing quality and reference action alignment.Furthermore, we introduce ART-F, an adaptive RL framework that combines request-adaptive inference clusters with asynchronous training for better GPU utilization, enabling LoRA training of 4-bit Qwen2.5-14B-ProActor-Q4 models on 4×H200 and 8×H100 GPUs with substantial speedups. Experiments on two newly auto-annotated datasets demonstrate significant improvements in proactive timing while maintaining action consistency comparable to state-of-the-art baselines. Ablations validate the effectiveness of distinct composite reward variations.
Anthology ID:
2026.acl-long.832
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18257–18303
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.832/
DOI:
Bibkey:
Cite (ACL):
Lei Ding, Bin He, Chenguang Wang, and Yang Liu. 2026. ProActor: Timing-Aware Reinforcement Learning for Proactive Task Scheduling Agents. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18257–18303, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
ProActor: Timing-Aware Reinforcement Learning for Proactive Task Scheduling Agents (Ding et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.832.pdf
Checklist:
 2026.acl-long.832.checklist.pdf