Quanyu Dai


2025

pdf bib
Improving Retrospective Language Agents via Joint Policy Gradient Optimization
Xueyang Feng | Bo Lan | Quanyu Dai | Lei Wang | Jiakai Tang | Xu Chen | Zhenhua Dong | Ji-Rong Wen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

In recent research advancements within the community, large language models (LLMs) have sparked great interest in creating autonomous agents. However, current prompt-based agents often heavily rely on large-scale LLMs. Meanwhile, although fine-tuning methods significantly enhance the capabilities of smaller LLMs, the fine-tuned agents often lack the potential for self-reflection and self-improvement. To address these challenges, we introduce a novel agent framework named RetroAct, which is a framework that jointly optimizes both task-planning and self-reflective evolution capabilities in language agents. Specifically, we develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning, and design an off-policy joint policy gradient optimization algorithm with imitation learning regularization to enhance the data efficiency and training stability in agent tasks. RetroAct significantly improves the performance of open-source models, reduces dependency on closed-source LLMs, and enables fine-tuned agents to learn and evolve continuously. We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.

2022

pdf bib
Boosting Deep CTR Prediction with a Plug-and-Play Pre-trainer for News Recommendation
Qijiong Liu | Jieming Zhu | Quanyu Dai | Xiao-Ming Wu
Proceedings of the 29th International Conference on Computational Linguistics

Understanding news content is critical to improving the quality of news recommendation. To achieve this goal, recent studies have attempted to apply pre-trained language models (PLMs) such as BERT for semantic-enhanced news recommendation. Despite their great success in offline evaluation, it is still a challenge to apply such large PLMs in real-time ranking model due to the stringent requirement in inference and updating time. To bridge this gap, we propose a plug-and-play pre-trainer, namely PREC, to learn both user and news encoders through multi-task pre-training. Instead of directly leveraging sophisticated PLMs for end-to-end inference, we focus on how to use the derived user and item representations to boost the performance of conventional lightweight models for click-through-rate prediction. This enables efficient online inference as well as compatibility to conventional models, which would significantly ease the practical deployment. We validate the effectiveness of PREC through both offline evaluation on public datasets and online A/B testing in an industrial application.