Yang Yang
Other people with similar names: Yang Yang, Yang Yang, Yang Yang, Yang Yang, Yang Yang, Yang Yang, Yang Yang, Yang Yang, Yang Yang, Yang Yang
Unverified author pages with similar names: Yang Yang
2026
Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models
Xudong Wang | Chaoning Zhang | Chenghao Li | Shuxu Chen | Qigan Sun | Jiaquan Zhang | Fachrina Dewi Puspitasari | Tae-Ho Kim | Jiwei Wei | Malu Zhang | Guoqing Wang | Yang Yang | Heng Tao Shen
Findings of the Association for Computational Linguistics: ACL 2026
Xudong Wang | Chaoning Zhang | Chenghao Li | Shuxu Chen | Qigan Sun | Jiaquan Zhang | Fachrina Dewi Puspitasari | Tae-Ho Kim | Jiwei Wei | Malu Zhang | Guoqing Wang | Yang Yang | Heng Tao Shen
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, while recent prompting strategies such as Chain-of-Thought (CoT) have further elevated their performance in handling complex logical problems. Despite these advances, high-quality reasoning remains heavily reliant on manual static prompts and is sensitive to decoding configurations and task distributions, leading to performance fluctuations and limited transferability. Existing automatic prompt optimization methods typically adopt single-agent local search, failing to simultaneously optimize prompts and decoding hyperparameters within a unified framework to achieve stable global improvements. To address this limitation, we propose Agent-GWO, a dynamic prompt optimization framework for complex reasoning. Specifically, we unify prompt templates and decoding hyperparameters as inheritable agent configurations. By leveraging the leader-follower mechanism of the Grey Wolf Optimizer (GWO), we automatically select three leader agents (𝛼, 𝛽, and 𝛿) to guide the collaborative updates of the remaining agents, enabling iterative convergence toward robust optimal reasoning configurations that can be seamlessly integrated for inference. Extensive experiments on multiple mathematical and hybrid reasoning benchmarks across diverse LLM backbones show that Agent-GWO consistently improves accuracy and stability over existing prompt optimization methods.
Lightweight LLM Agent Memory with Small Language Models
Jiaquan Zhang | Chaoning Zhang | Shuxu Chen | Zhenzhen Huang | Pengcheng Zheng | Zhicheng Wang | Ping Guo | Fan Mo | Sung-Ho Bae | Jie Zou | Jiwei Wei | Yang Yang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiaquan Zhang | Chaoning Zhang | Shuxu Chen | Zhenzhen Huang | Pengcheng Zheng | Zhicheng Wang | Ping Guo | Fan Mo | Sung-Ho Bae | Jie Zou | Jiwei Wei | Yang Yang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construction and candidate filtering. In contrast, many systems use repeated large-model calls for online memory operations, improving accuracy but accumulating latency over long interactions. We propose LightMem, a lightweight memory system for better agent memory driven by Small Language Models (SLMs). LightMem modularizes memory retrieval, writing, and long-term consolidation, and separates online processing from offline consolidation to enable efficient memory invocation under bounded compute. We organize memory into short-term memory (STM) for immediate conversational context, mid-term memory (MTM) for reusable interaction summaries, and long-term memory (LTM) for consolidated knowledge, and uses user identifiers to support independent retrieval and incremental maintenance in multi-user settings. Online, LightMem operates under a fixed retrieval budget and selects memories via a two-stage procedure: vector-based coarse retrieval followed by semantic consistency re-ranking. Offline, it abstracts reusable interaction evidence and incrementally integrates it into LTM. Experiments show consistent gains across model scales, with an average F1 improvement of about 2.5 over A-MEM on LoCoMo, while achieving higher efficiency and low median latency (83 ms for retrieval and 581 ms end-to-end).