Yawei Wang
2026
Reinforcement Learning for Self-Improving Agent with Skill Library
Jiongxiao Wang | Qiaojing Yan | Yawei Wang | Yijun Tian | Soumya Smruti Mishra | Zhichao Xu | Megha Gandhi | Panpan Xu | Lin Lee Cheong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiongxiao Wang | Qiaojing Yan | Yawei Wang | Yijun Tian | Soumya Smruti Mishra | Zhichao Xu | Megha Gandhi | Panpan Xu | Lin Lee Cheong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions but struggle to continuously improve and adapt when deployed in new environments. One promising approach is implementing skill libraries that allow agents to learn, validate, and apply new skills. However, current skill library approaches rely primarily on LLM prompting, making consistent skill library implementation challenging. To overcome these challenges, we propose a Reinforcement Learning (RL)-based approach to enhance agents’ self-improvement capabilities with a skill library. Specifically, we introduce Skill Augmented GRPO for self-Evolution (SAGE), a novel RL framework that systematically incorporates skills into learning. The framework’s key component, Sequential Rollout, iteratively deploys agents across a chain of similar tasks for each rollout. As agents navigate through the task chain, skills generated from previous tasks accumulate in the library and become available for subsequent tasks. Additionally, the framework enhances skill generation and utilization through a Skill-integrated Reward that complements the original outcome-based rewards. Experimental results on AppWorld demonstrate that SAGE, when applied to supervised-finetuned model with expert experience, achieves 8.9% higher Scenario Goal Completion while requiring 26% fewer interaction steps and generating 59% fewer tokens, substantially outperforming existing approaches in both accuracy and efficiency. Our code is available at https://github.com/amazon-science/SAGE.
SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph
Jiazheng Li | Yawei Wang | Qiaojing Yan | Yijun Tian | Zhichao Xu | Huan Song | Panpan Xu | Lin Lee Cheong
Findings of the Association for Computational Linguistics: EACL 2026
Jiazheng Li | Yawei Wang | Qiaojing Yan | Yijun Tian | Zhichao Xu | Huan Song | Panpan Xu | Lin Lee Cheong
Findings of the Association for Computational Linguistics: EACL 2026
Large Language Models (LLMs) have demonstrated remarkable capabilities, enabling language agents to excel at single-turn tasks. However, their application to complex, multi-step, and long-horizon tasks remains challenging. While reinforcement learning (RL) offers a promising avenue for addressing these challenges, mainstream approaches typically rely solely on sparse, outcome-based rewards — a limitation that becomes especially problematic for group-based RL algorithms lacking critic models, such as Group Relative Policy Optimization (GRPO). In such methods, uniformly rewarding or penalizing all actions within a trajectory can lead to training instability and suboptimal policies, because beneficial and detrimental actions are often entangled across multi-step interactions. To address this challenge, we propose SALT, a novel and lightweight framework that provides a finer-grained advantage assignment, derived solely from outcome rewards. We achieve this by constructing a graph from trajectories of the same prompt, which allows us to quantify the quality of each step and assign advantages accordingly. Crucially, SALT is designed as a plug-and-play module that seamlessly integrates with existing group-based RL algorithms — requiring no modifications to the rollout procedure and introducing negligible computational overhead. Extensive experiments on the WebShop, ALFWorld, and AppWorld benchmarks with various model sizes demonstrate that SALT consistently improves performance. We also conduct a thorough analysis to validate the design choices behind SALT and offer actionable insights.
2025
A Systematic Survey of Automatic Prompt Optimization Techniques
Kiran Ramnath | Kang Zhou | Sheng Guan | Soumya Smruti Mishra | Xuan Qi | Zhengyuan Shen | Shuai Wang | Sangmin Woo | Sullam Jeoung | Yawei Wang | Haozhu Wang | Han Ding | Yuzhe Lu | Zhichao Xu | Yun Zhou | Balasubramaniam Srinivasan | Qiaojing Yan | Yueyan Chen | Haibo Ding | Panpan Xu | Lin Lee Cheong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Kiran Ramnath | Kang Zhou | Sheng Guan | Soumya Smruti Mishra | Xuan Qi | Zhengyuan Shen | Shuai Wang | Sangmin Woo | Sullam Jeoung | Yawei Wang | Haozhu Wang | Han Ding | Yuzhe Lu | Zhichao Xu | Yun Zhou | Balasubramaniam Srinivasan | Qiaojing Yan | Yueyan Chen | Haibo Ding | Panpan Xu | Lin Lee Cheong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Since the advent of large language models (LLMs), prompt engineering has been a crucial step for eliciting desired responses for various Natural Language Processing (NLP) tasks. However, prompt engineering remains an impediment for end users due to rapid advances in models, tasks, and associated best practices. To mitigate this, Automatic Prompt Optimization (APO) techniques have recently emerged that use various automated techniques to help improve the performance of LLMs on various tasks. In this paper, we present a comprehensive survey summarizing the current progress and remaining challenges in this field. We provide a formal definition of APO, a 5-part unifying framework, and then proceed to rigorously categorize all relevant works based on their salient features therein. We hope to spur further research guided by our framework.