Lu Wang
Other people with similar names: Lu Wang, Lu Wang, Lu Wang, Lu Wang
Unverified author pages with similar names: Lu Wang
2026
Entropy Scheduling in Reinforcement Learning for Large Language Models
Xingjin Wang | Howe Tissue | Lu Wang | Linjing Li | Daniel Dajun Zeng
Findings of the Association for Computational Linguistics: ACL 2026
Xingjin Wang | Howe Tissue | Lu Wang | Linjing Li | Daniel Dajun Zeng
Findings of the Association for Computational Linguistics: ACL 2026
We observe that entropy in reinforcement learning functions analogously to the learning rate in LLMs. Maintaining stable entropy, as demonstrated in DAPO, helps stabilize RL training, while rapid entropy annealing (i.e., so-called entropy collapse) accelerates local performance improvement and enables faster convergence. We argue that these two processes are not antithetical, but can be effectively controlled and scheduled within a single training run, similar to learning rate scheduling. We propose Entropy Schduling (ES), which optimizes different pre-set goals (e.g. k in optimizing Pass@k) by controlling and scheduling entropy at each step of the RL process. We find that maintaining stable entropy early in training followed by entropy annealing achieves superior performance. Moreover, since stable-state entropy and annealed entropy exhibit distinctly different learning dynamics, curriculum learning can be seamlessly integrated to maximize model performance based on different entropy phases. We show that entropy scheduling is straightforward to implement and intuitive in design. Extensive experiments suggest that it delivers consistent and stable performance improvements across diverse models and algorithms.
Breaking the Impasse: Dual-Scale Evolutionary Policy Training for Social Language Agents
Minzheng Wang | Run Luo | Yanbo Wang | Zichen Liu | Yuqiao Tan | Tao Tan | Nan Xu | Lu Wang | Wenji Mao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Minzheng Wang | Run Luo | Yanbo Wang | Zichen Liu | Yuqiao Tan | Tao Tan | Nan Xu | Lu Wang | Wenji Mao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for closed-ended tasks, extending it to open-ended social language games via self-play reveals a critical issue: evolution impasse. Due to the vast strategy space, language agents frequently converge to homogenized behaviors, leading to deterministic match outcomes that eliminate the gradient signals necessary for policy evolution. To tackle this issue, we propose Dual-scale Evolutionary Policy Training (DEPT) for social language games. DEPT introduces a time-scaled evolutionary perception mechanism that detects impasse by quantifying dual-scale value baseline divergence alongside match entropy. Upon perceiving the collapse, it then activates asymmetric advantage reshaping to dynamically modulate the optimization landscape for intervention. Thus, our method effectively restores gradient signals and enforces sustained strategic exploration. Extensive experiments on multiple social language games demonstrate that DEPT outperforms strong baselines, avoiding policy degeneration and driving the continuous evolution of social language agents.
Break Through the Compression Bottleneck: From Theory to Practice
Xiusheng Huang | Lu Wang | Yequan Wang | Jun Zhao | Kang Liu
Findings of the Association for Computational Linguistics: ACL 2026
Xiusheng Huang | Lu Wang | Yequan Wang | Jun Zhao | Kang Liu
Findings of the Association for Computational Linguistics: ACL 2026
As the parameter size of language models continues to grow, effective model compression is required to reduce their computational and memory overhead. Existing compression methods suffer from bottleneck issues: when the compression ratio is increased, performance degrades significantly. Low-rank decomposition and quantization are two prominent compression methods that have been proven to significantly reduce the computational and memory requirements of Large Language Models (LLMs) while maintaining model accuracy. Evidently, combining these two methods will break through the existing compression bottleneck. However, how these two methods interact when combined remains a critical question for developers, as many assume they are orthogonal, meaning their combination would not introduce additional errors beyond those independently introduced by each method. This paper provides the first mathematical proof that low-rank decomposition and quantization are non-orthogonal. We validate these findings through a series of experiments on large language models. Our results demonstrate that these methods are non-orthogonal, and their combination leads to significant performance degradation. Importantly, we propose a novel approach Diagonal Adhesive Method (DAM), which can effectively combine the two methods and mitigate the performance loss. Our research provides deep insights into model compression and lays a solid theoretical and experimental foundation for future related studies.
Theory-optimal Quantization Based on Flatness
Xiusheng Huang | Zhe Li | Xuanwu Yin | Lu Wang | Yequan Wang | Dong Li | Emad Barsoum | Kang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiusheng Huang | Zhe Li | Xuanwu Yin | Lu Wang | Yequan Wang | Dong Li | Emad Barsoum | Kang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Post-training quantization has emerged as a widely adopted technique for compressing and accelerating the inference of Large Language Models (LLMs). The primary challenges in LLMs quantization stem from activation outliers, which significantly degrade model performance especially at lower bit precision. While recent approaches attempt to mitigate outliers through linear transformations across feature dimensions, our analysis reveals that the transformed weights and activations still exhibit persistent outlier patterns with concentrated magnitude distributions. In this paper, we first model the mathematical relationship between quantization error and outliers, and then introduce a new metric Flatness to quantify the distribution of outliers. Based on this, we derive the theoretical optimal solution with respect to Flatness. Building on these insights, we propose Bidirectional Diagonal Quantization (BDQ), a novel post-training quantization framework that effectively disperses outlier patterns through optimized matrix transformations. BDQ strategically distributes outlier magnitudes across matrix dimensions via learned diagonal operations. Extensive experiments demonstrate that BDQ establishes a new quantization benchmark. It achieves less than 1% accuracy drop in W4A4 quantization on the LLaMA-3-8B model. In the more challenging W2A4KV16 experiment, compared to state-of-the-art approaches, BDQ reduces the performance gap by 39.1% on the DeepSeek-R1-Distill-LLaMA-70B model.
2025
Re3Syn: A Dependency-Based Data Synthesis Framework for Long-Context Post-training
Zhiyang Zhang | Ziqiang Liu | Huiming Wang | Renke Shan | Li Kuang | Lu Wang | De Wen Soh
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiyang Zhang | Ziqiang Liu | Huiming Wang | Renke Shan | Li Kuang | Lu Wang | De Wen Soh
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
An important trend in the realm of large language models (LLMs) is the development of longer context windows. However, training LLMs with long context windows to acquire the capability of effectively modeling lengthy inputs is often hindered by the scarcity of naturally long-context data. Existing methods for constructing long-context data by concatenating short documents have overlooked a crucial characteristic of long-context data quality, namely semantic dependency. In this paper, we propose a novel framework called Retrieval, Dependency Recognition, and Reorder for data synthesis (Re3Syn), which leverages semantic similarity to retrieve relevant documents and form several batches. Within each batch, the framework comprehensively recognizes dependency and utilizes them, along with a reorder algorithm, to organize the short documents into coherent long-context data. Comprehensive experiment on multiple benchmarks indicate that the data generated by the Re3Syn has longer dependencies and significantly enhances the model’s long-context capabilities. For reproducibility, we will release our codebase upon acceptance.
CLaSp: In-Context Layer Skip for Self-Speculative Decoding
Longze Chen | Renke Shan | Huiming Wang | Lu Wang | Ziqiang Liu | Run Luo | Jiawei Wang | Hamid Alinejad-Rokny | Min Yang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Longze Chen | Renke Shan | Huiming Wang | Lu Wang | Ziqiang Liu | Run Luo | Jiawei Wang | Hamid Alinejad-Rokny | Min Yang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Speculative decoding (SD) is a promising method for accelerating the decoding process of Large Language Models (LLMs). The efficiency of SD primarily hinges on the consistency between the draft model and the verify model. However, existing drafting approaches typically require additional modules to be trained, which can be challenging to implement and ensure compatibility across various LLMs. In this paper, we propose CLaSp, an in-context layer-skipping strategy for self-speculative decoding. Unlike prior methods, CLaSp does not require additional drafting modules or extra training. Instead, it employs a plug-and-play mechanism by skipping intermediate layers of the verify model to construct a compressed draft model. Specifically, we develop a dynamic programming algorithm that optimizes the layer-skipping process by leveraging the complete hidden states from the last verification stage as an objective. This enables CLaSp to dynamically adjust its layer-skipping strategy after each verification stage, without relying on pre-optimized sets of skipped layers. Experimental results across diverse downstream tasks demonstrate that CLaSp achieves a speedup of 1.3× ∼ 1.7× on LLaMA3 series models without altering the original distribution of the generated text.
Search
Fix author
Co-authors
- Xiusheng Huang 2
- Kang Liu 2
- Ziqiang Liu 2
- Run Luo 2
- Renke Shan 2
- Yequan Wang 2
- Huiming Wang 2
- Hamid Alinejad-Rokny 1
- Emad Barsoum 1
- Longze Chen 1
- Li Kuang 1
- Linjing Li 1
- Zhe Li 1
- Dong Li 1
- Zichen Liu 1
- Wenji Mao 1
- De Wen Soh 1
- Hao Sun 1
- Yuqiao Tan 1
- Tao Tan 1
- Xingjin Wang 1
- Minzheng Wang 1
- Yanbo Wang 1
- Jiawei Wang 1
- Nan Xu 1
- Min Yang 1
- Xuanwu Yin 1
- Daniel Dajun Zeng 1
- Zhiyang Zhang 1
- Jun Zhao 1