Yiwu
2026
Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models
Wei Wu | Liyi Chen | Congxi Xiao | Tianfu Wang | Qimeng Wang | Chengqiang Lu | Yan Gao | Yiwu | Yao Hu | Hui Xiong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Wei Wu | Liyi Chen | Congxi Xiao | Tianfu Wang | Qimeng Wang | Chengqiang Lu | Yan Gao | Yiwu | Yao Hu | Hui Xiong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large reasoning models enhanced by reinforcement learning with verifiable rewards have achieved significant performance gains by extending their chain-of-thought. However, this paradigm incurs substantial deployment costs as models often exhibit excessive verbosity on simple queries. Existing efficient reasoning methods relying on explicit length penalties often introduce optimization conflicts and leave the generative mechanisms driving overthinking largely unexamined. In this paper, we identify a phenomenon termed length shift where models increasingly generate unnecessary reasoning on trivial inputs during training. To address this, we introduce Dynamic Outlier Truncation (DOT), a training-time intervention that selectively suppresses redundant tokens. This method targets only the extreme tail of response lengths within fully correct rollout groups while preserving long-horizon reasoning capabilities for complex problems. To complement this intervention and ensure stable convergence, we further incorporate auxiliary KL regularization and predictive dynamic sampling. Experimental results across multiple model scales demonstrate that our approach significantly pushes the efficiency-performance Pareto frontier outward. Notably, on the AIME-24, our method reduces inference token usage by 78% while simultaneously increasing accuracy compared to the initial policy and surpassing state-of-the-art efficient reasoning methods.
SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility
Xuyang Zhi | Peilun Zhou | Chengqiang Lu | Hang Lv | Yiwei Liang | Rongyang Zhang | Yan Gao | Yiwu | Yao Hu | Hongchao Gu | Defu Lian | Hao Wang | Enhong Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xuyang Zhi | Peilun Zhou | Chengqiang Lu | Hang Lv | Yiwei Liang | Rongyang Zhang | Yan Gao | Yiwu | Yao Hu | Hongchao Gu | Defu Lian | Hao Wang | Enhong Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The evolution of Large Language Models (LLMs) is shifting the focus from single, verifiable tasks toward complex, open-ended real-world scenarios, imposing significant challenges on the post-training phase. In these settings, the scale and complexity of reward systems have grown significantly, transitioning toward multi-objective formulations that encompass a comprehensive spectrum of model capabilities and application contexts. However, traditional methods typically rely on fixed reward weights, ignoring non-stationary learning dynamics and struggling with data heterogeneity across dimensions. To address these issues, we propose SPARD, a framework that establishes an automated, self-paced curriculum by perceiving learning progress to dynamically adjust multi-objective reward weights and data importance, thereby synchronizing learning intent with data utility for optimal performance. Extensive experiments across multiple benchmarks demonstrate that SPARD significantly enhances model capabilities across all domains. Our code is publicly available at https://github.com/USTC-StarTeam/SPARD.
2025
SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment
Yuqing Huang | Rongyang Zhang | Qimeng Wang | Chengqiang Lu | Yan Gao | Yiwu | Yao Hu | Xuyang Zhi | Guiquan Liu | Xin Li | Hao Wang | Enhong Chen
Findings of the Association for Computational Linguistics: EMNLP 2025
Yuqing Huang | Rongyang Zhang | Qimeng Wang | Chengqiang Lu | Yan Gao | Yiwu | Yao Hu | Xuyang Zhi | Guiquan Liu | Xin Li | Hao Wang | Enhong Chen
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent advancements in large language models (LLMs) have revolutionized natural language processing through their remarkable capabilities in understanding and executing diverse tasks. While supervised fine-tuning, particularly in Retrieval-Augmented Generation (RAG) scenarios, effectively enhances task-specific performance, it often leads to catastrophic forgetting, where models lose their previously acquired knowledge and general capabilities. Existing solutions either require access to general instruction data or face limitations in preserving the model’s original distribution. To overcome these limitations, we propose SelfAug, a self-distribution alignment method that aligns input sequence logits to preserve the model’s semantic distribution, thereby mitigating catastrophic forgetting and improving downstream performance. Extensive experiments demonstrate that SelfAug achieves a superior balance between downstream learning and general capability retention. Our comprehensive empirical analysis reveals a direct correlation between distribution shifts and the severity of catastrophic forgetting in RAG scenarios, highlighting how the absence of RAG capabilities in general instruction tuning leads to significant distribution shifts during fine-tuning. Our findings not only advance the understanding of catastrophic forgetting in RAG contexts but also provide a practical solution applicable across diverse fine-tuning scenarios.
DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision
Yongqi Leng | Yikun Lei | Xikai Liu | Meizhi Zhong | Bojian Xiong | Yurong Zhang | Yan Gao | Yiwu | Yao Hu | Deyi Xiong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Yongqi Leng | Yikun Lei | Xikai Liu | Meizhi Zhong | Bojian Xiong | Yurong Zhang | Yan Gao | Yiwu | Yao Hu | Deyi Xiong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Agentic Retrieval-Augmented Generation (Agentic RAG) enhances the processing capability for complex tasks through dynamic retrieval and adaptive workflows. Recent advances (e.g., Search-R1) have shown that outcome-supervised reinforcement learning demonstrate strong performance. However, this approach still suffers from inefficient exploration, sparse reward signals, and ambiguous global reward feedback.To address these challenges, we propose DecEx-RAG, which models RAG as a Markov Decision Process (MDP) incorporating decision-making and execution, while introducing an efficient pruning strategy to optimize data expansion. Through comprehensive process-level policy optimization, DecEx-RAG significantly enhances the autonomous task decomposition, dynamic retrieval, and high-quality answer generation capabilities of large language models (LLMs). Experiments show that DecEx-RAG achieves an average absolute performance improvement of 6.2% across six datasets, significantly outperforming existing baselines. Moreover, the pruning strategy improves data construction efficiency by nearly 6 ×, providing an efficient solution for process-supervised RAG training. The code is available at https://github.com/sdsxdxl/DecEx-RAG.
SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation
Qian Dong | Jia Chen | Qingyao Ai | Hongning Wang | Haitao Li | Yiwu | Yao Hu | Yiqun Liu | Shaoping Ma
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Qian Dong | Jia Chen | Qingyao Ai | Hongning Wang | Haitao Li | Yiwu | Yao Hu | Yiqun Liu | Shaoping Ma
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Existing retrieval-augmented code generation (RACG) methods typically use an external retrieval module to fetch semantically similar code snippets used for generating subsequent fragments. However, even for consecutive code fragments, the content often diverges due to logical progression, resulting in a content gap. This gap undermines the performance of current RACG methods, as external retrieval modules based on content matching fail to infer the specific information need of LLMs to generate the next code fragment. Therefore, we propose SelfRACG, a novel paradigm that enables large language models (LLMs) to Self-express their information needs to enhance RACG. Specifically, SelfRACG includes an information need expression module and a two-stage information need-guided training strategy, which encourages LLMs to express their information need. Extensive experiments demonstrate that SelfRACG can retrieve external knowledge that better aligns with the LLM’s own information needs, resulting in superior generation performance compared to vanilla RACG. Moreover, both the training and deployment costs for retrieval in our framework are much lower than those of the strongest retrieval model.
Think-Search-Patch: A Retrieval-Augmented Reasoning Framework for Repository-Level Code Repair
Bojian Xiong | Yikun Lei | Xikai Liu | Shaowei Zhang | Pengyun Zhu | Yan Liu | Yongqi Leng | Ling Shi | Meizhi Zhong | Yurong Zhang | Yan Gao | Yiwu | Yao Hu | Deyi Xiong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Bojian Xiong | Yikun Lei | Xikai Liu | Shaowei Zhang | Pengyun Zhu | Yan Liu | Yongqi Leng | Ling Shi | Meizhi Zhong | Yurong Zhang | Yan Gao | Yiwu | Yao Hu | Deyi Xiong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Large language models usually suffer from multiple-file coding scenarios where strong inter-file dependencies manifest, typically demonstrated in SWE-bench. To mitigate this issue, we propose Think-Search-Patch (TSP), a retrieval-augmented reasoning framework for repository-level code repair. At the Think stage, our system breaks down a coding task and creates clear search query. Next, at the Search stage, it retrieves relevant code snippets using models like E5. At the final Patch stage, it generates standardized patches based on the key snippets. In addition the proposed framework, we enhance system reliability through a two-stage training process. At the first stage, the system undergoes supervised fine-tuning (SFT) on our TSP dataset. At the subsequent stage, we employ rejection sampling with correction to generate preference pairs for Direct Preference Optimization (DPO) training, thereby reducing errors in the intermediate phases. Experimental results demonstrate that TSP framework enhances retrieval accuracy and repair success on SWE-bench Lite, even surpassing models with a larger size in managing extensive code contexts and successfully addressing bugs spanning across multiple files. All data and code available at https://github.com/Gengar0215/TSP-framework.
RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios
Fei Zhao | Chengqiang Lu | Yufan Shen | Qimeng Wang | Yicheng Qian | Haoxin Zhang | Yan Gao | Yiwu | Yao Hu | Zhen Wu | Shangyu Xing | Xinyu Dai
Findings of the Association for Computational Linguistics: EMNLP 2025
Fei Zhao | Chengqiang Lu | Yufan Shen | Qimeng Wang | Yicheng Qian | Haoxin Zhang | Yan Gao | Yiwu | Yao Hu | Zhen Wu | Shangyu Xing | Xinyu Dai
Findings of the Association for Computational Linguistics: EMNLP 2025
While various multimodal multi-image evaluation datasets have been emerged, but these datasets are primarily based on English, and there has yet to be a Chinese multi-image dataset. To fill this gap, we introduce RealBench, the first Chinese multimodal multi-image dataset, which contains 9393 samples and 69910 images. RealBench distinguishes itself by incorporating real user-generated content, ensuring high relevance to real-world applications. Additionally, the dataset covers a wide variety of scenes, image resolutions, and image structures, further increasing the difficulty of multi-image understanding. Ultimately, we conduct a comprehensive evaluation of RealBench using 21 multimodal LLMs of different sizes, including closed-source models that support multi-image inputs as well as open-source visual and video models. The experimental results indicate that even the most powerful closed-source models still face challenges when handling multi-image Chinese scenarios. Moreover, there remains a noticeable performance gap of around 71.8% on average between open-source visual/video models and closed-source models. These results show that RealBench provides an important research foundation for further exploring multi-image understanding capabilities in the Chinese context. Our datasets will be publicly available.
Search
Fix author
Co-authors
- Yao Hu 7
- Yan Gao 6
- Chengqiang Lu 4
- Qimeng Wang 3
- Enhong Chen 2
- Yikun Lei 2
- Yongqi Leng 2
- Xikai Liu 2
- Hao Wang 2
- Bojian Xiong 2
- Deyi Xiong (德意 熊) 2
- Rongyang Zhang 2
- Yurong Zhang 2
- Xuyang Zhi 2
- Meizhi Zhong 2
- Qingyao Ai 1
- Liyi Chen 1
- Jia Chen 1
- Xinyu Dai 1
- Qian Dong 1
- Hongchao Gu 1
- Yuqing Huang 1
- Xin Li 1
- Haitao Li 1
- Defu Lian 1
- Yiwei Liang 1
- Guiquan Liu 1
- Yiqun Liu 1
- Yan Liu 1
- Hang Lv 1
- Shaoping Ma 1
- Yicheng Qian 1
- Yufan Shen 1
- Ling Shi 1
- Tianfu Wang 1
- Hongning Wang 1
- Wei Wu 1
- Zhen Wu 1
- Congxi Xiao 1
- Shangyu Xing 1
- Hui Xiong 1
- Shaowei Zhang 1
- Haoxin Zhang 1
- Fei Zhao 1
- Peilun Zhou 1
- Pengyun Zhu 1