Zhong-Zhi Li
2026
Too Long, Do Re-weighting for Efficient LLM Reasoning Compression
Zhong-Zhi Li | Xiao Liang | Zihao Tang | Lei Ji | Peijie Wang | Haotian Xu | Xing W | Haizhen Huang | Weiwei Deng | Yeyun Gong | Zhijiang Guo | Xiao Liu | Fei Yin | Cheng-Lin Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhong-Zhi Li | Xiao Liang | Zihao Tang | Lei Ji | Peijie Wang | Haotian Xu | Xing W | Haizhen Huang | Weiwei Deng | Yeyun Gong | Zhijiang Guo | Xiao Liu | Fei Yin | Cheng-Lin Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have recently achieved remarkable progress on complex reasoning tasks by leveraging extended Chain-of-Thought (CoT) techniques. These reasoning processes can be roughly categorized into System-1 (fast and intuitive) and System-2 (slow and deliberate) paradigms. However, excessive reliance on lengthy System-2-style reasoning during inference can produce extremely long outputs, thereby reducing efficiency. In this work, we propose Thinking Length Data Re-weighting (TLDR), that does not rely on sophisticated data annotations or interpolation between multiple models. We continuously balance the weights between the model’s System-1 and System-2 data to eliminate redundant reasoning processes while preserving the model’s reasoning capability. We validate our method across multiple base models, including Deepseek-R1-Distilled Qwen models, as well as on a diverse benchmarks with varying difficulty levels. Our method significantly reduces the number of output tokens by nearly 40% while maintaining the accuracy of the reasoning.
Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability
Xiao Liang | Zhong-Zhi Li | Zhenghao Lin | Eric Hanchen Jiang | Hengyuan Zhang | Yelong Shen | Kai-Wei Chang | Ying Nian Wu | Yeyun Gong | Weizhu Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiao Liang | Zhong-Zhi Li | Zhenghao Lin | Eric Hanchen Jiang | Hengyuan Zhang | Yelong Shen | Kai-Wei Chang | Ying Nian Wu | Yeyun Gong | Weizhu Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have demonstrated strong reasoning capabilities through step-by-step chain-of-thought (CoT) reasoning. Nevertheless, at the limits of model capability, CoT often proves insufficient, and its strictly sequential nature constrains test-time scalability. A potential alternative is divide-and-conquer (DAC) reasoning, which decomposes a complex problem into subproblems to facilitate more effective exploration of the solution space. Although promising, our analysis reveals a fundamental misalignment between general-purpose post-training and DAC-style inference, which limits the model’s capacity to fully leverage this potential. To bridge this gap and fully unlock LLMs’ reasoning capabilities on the most challenging tasks, we propose an end-to-end reinforcement learning (RL) framework to enhance their DAC-style reasoning capacity. At each step, the policy decomposes a problem into a group of subproblems, solves them sequentially, and addresses the original problem conditioned on the subproblem solutions, with both decomposition and solution integrated into RL training. Under comparable training settings, our DAC-style framework endows the model with a higher performance ceiling and stronger test-time scalability, surpassing CoT by 8.6% in Pass@1 and 6.3% in Pass@32 on competition-level benchmarks. The code is available at the [provided link](https://github.com/MasterVito/DAC-RL).
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
Zelin Tan | Hejia Geng | Xiaohang Yu | Mulei Zhang | Guancheng Wan | Yifan Zhou | Qiang He | Xiangyuan Xue | Heng Zhou | Yutao Fan | Zhong-Zhi Li | Zaibin Zhang | Guibin Zhang | Chen Zhang | Zhenfei Yin | Philip Torr | Lei Bai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zelin Tan | Hejia Geng | Xiaohang Yu | Mulei Zhang | Guancheng Wan | Yifan Zhou | Qiang He | Xiangyuan Xue | Heng Zhou | Yutao Fan | Zhong-Zhi Li | Zaibin Zhang | Guibin Zhang | Chen Zhang | Zhenfei Yin | Philip Torr | Lei Bai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While scaling laws for large language models (LLMs) during pre-training have been extensively studied, their behavior under reinforcement learning (RL) post-training remains largely unexplored. This paper investigates the scaling behavior of Large Language Model (LLM) reinforcement learning post-training, focusing on mathematical reasoning. Through experiments across the Qwen2.5 series (0.5B to 72B), we characterize how model scale, data, and compute interact. Our analysis yields four key findings: 1. Larger models consistently demonstrate superior compute and data efficiency. 2. The relationship between model performance and training resources follows a **predictive power-law** across both base and instruction-tuned models. 3. RL learning efficiency exhibits a latent **saturation trend** with increasing model scale. 4. In data-constrained regimes, performance is primarily driven by the **total volume of training data** rather than sample uniqueness. These results offer practical guidelines for scaling reasoning capabilities through reinforcement learning post-training.
2025
Safety in Large Reasoning Models: A Survey
Cheng Wang | Yue Liu | Baolong Bi | Duzhen Zhang | Zhong-Zhi Li | Yingwei Ma | Yufei He | Shengju Yu | Xinfeng Li | Junfeng Fang | Jiaheng Zhang | Bryan Hooi
Findings of the Association for Computational Linguistics: EMNLP 2025
Cheng Wang | Yue Liu | Baolong Bi | Duzhen Zhang | Zhong-Zhi Li | Yingwei Ma | Yufei He | Shengju Yu | Xinfeng Li | Junfeng Fang | Jiaheng Zhang | Bryan Hooi
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Reasoning Models (LRMs) have exhibited extraordinary prowess in tasks like mathematics and coding, leveraging their advanced reasoning capabilities. Nevertheless, as these capabilities progress, significant concerns regarding their vulnerabilities and safety have arisen, which can pose challenges to their deployment and application in real-world settings. This paper presents the first comprehensive survey of LRMs, meticulously exploring and summarizing the newly emerged safety risks, attacks, and defense strategies specific to these powerful reasoning-enhanced models. By organizing these elements into a detailed taxonomy, this work aims to offer a clear and structured understanding of the current safety landscape of LRMs, facilitating future research and development to enhance the security and reliability of these powerful models.
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating
Chao Deng | Jiale Yuan | Pi Bu | Peijie Wang | Zhong-Zhi Li | Jian Xu | Xiao-Hui Li | Yuan Gao | Jun Song | Bo Zheng | Cheng-Lin Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chao Deng | Jiale Yuan | Pi Bu | Peijie Wang | Zhong-Zhi Li | Jian Xu | Xiao-Hui Li | Yuan Gao | Jun Song | Bo Zheng | Cheng-Lin Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large vision language models (LVLMs) have improved the document understanding capabilities remarkably, enabling the handling of complex document elements, longer contexts, and a wider range of tasks. However, existing document understanding benchmarks have been limited to handling only a small number of pages and fail to provide a comprehensive analysis of layout elements locating. In this paper, we first define three primary task categories: Long Document Understanding, numerical Reasoning, and cross-element Locating, and then propose a comprehensive benchmark—LongDocURL—integrating above three primary tasks and comprising 20 sub-tasks categorized based on different primary tasks and answer evidences. Furthermore, we develop a semi-automated construction pipeline and collect 2,325 high-quality question-answering pairs, covering more than 33,000 pages of documents, significantly outperforming existing benchmarks. Subsequently, we conduct comprehensive evaluation experiments on both open-source and closed- source models across 26 different configurations, revealing critical performance gaps in this field. The code and data: https://github.com/dengc2023/LongDocURL.
Enhancing Multimodal Continual Instruction Tuning with BranchLoRA
Duzhen Zhang | Yong Ren | Zhong-Zhi Li | Yahan Yu | Jiahua Dong | Chenxing Li | Zhilong Ji | Jinfeng Bai
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Duzhen Zhang | Yong Ren | Zhong-Zhi Li | Yahan Yu | Jiahua Dong | Chenxing Li | Zhilong Ji | Jinfeng Bai
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal Continual Instruction Tuning (MCIT) aims to finetune Multimodal Large Language Models (MLLMs) to continually align with human intent across sequential tasks. Existing approaches often rely on the Mixture-of-Experts (MoE) LoRA framework to preserve previous instruction alignments. However, these methods are prone to Catastrophic Forgetting (CF), as they aggregate all LoRA blocks via simple summation, which compromises performance over time. In this paper, we identify a critical parameter inefficiency in the MoELoRA framework within the MCIT context. Based on this insight, we propose BranchLoRA, an asymmetric framework to enhance both efficiency and performance. To mitigate CF, we introduce a flexible tuning-freezing mechanism within BranchLoRA, enabling branches to specialize in intra-task knowledge while fostering inter-task collaboration. Moreover, we incrementally incorporate task-specific routers to ensure an optimal branch distribution over time, rather than favoring the most recent task. To streamline inference, we introduce a task selector that automatically routes test inputs to the appropriate router without requiring task identity. Extensive experiments on the latest MCIT benchmark demonstrate that BranchLoRA significantly outperforms MoELoRA and maintains its superiority across various MLLM sizes.
2024
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving
Jiaxin Zhang | Zhong-Zhi Li | Ming-Liang Zhang | Fei Yin | Cheng-Lin Liu | Yashar Moshfeghi
Findings of the Association for Computational Linguistics: ACL 2024
Jiaxin Zhang | Zhong-Zhi Li | Ming-Liang Zhang | Fei Yin | Cheng-Lin Liu | Yashar Moshfeghi
Findings of the Association for Computational Linguistics: ACL 2024
Recent advancements in large language models (LLMs) and multi-modal models (MMs) have demonstrated their remarkable capabilities in problem-solving. Yet, their proficiency in tackling geometry math problems, which necessitates an integrated understanding of both textual and visual information, has not been thoroughly evaluated. To address this gap, we introduce the GeoEval benchmark, a comprehensive collection that includes a main subset of 2,000 problems, a 750 problems subset focusing on backward reasoning, an augmented sub- set of 2,000 problems, and a hard subset of 300 problems. This benchmark facilitates a deeper investigation into the performance of LLMs and MMs in solving geometry math problems. Our evaluation of ten LLMs and MMs across these varied subsets reveals that the WizardMath model excels, achieving a 55.67% accuracy rate on the main subset but only a 6.00% accuracy on the hard subset. This highlights the critical need for testing models against datasets on which they have not been pre-trained. Additionally, our findings indicate that GPT-series models perform more effectively on problems they have rephrased, suggesting a promising method for enhancing model capabilities.
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Zhong-Zhi Li | Ming-Liang Zhang | Fei Yin | Cheng-Lin Liu
Findings of the Association for Computational Linguistics: ACL 2024
Zhong-Zhi Li | Ming-Liang Zhang | Fei Yin | Cheng-Lin Liu
Findings of the Association for Computational Linguistics: ACL 2024
Geometry problem solving (GPS) is a challenging mathematical reasoning task requiring multi-modal understanding, fusion, and reasoning. Existing neural solvers take GPS as a vision-language task but are short in the representation of geometry diagrams that carry rich and complex layout information. In this paper, we propose a layout-aware neural solver named LANS, integrated with two new modules: multimodal layout-aware pre-trained language module (MLA-PLM) and layout-aware fusion attention (LA-FA). MLA-PLM adopts structural-semantic pre-training (SSP) to implement global relationship modeling, and point-match pre-training (PMP) to achieve alignment between visual points and textual points. LA-FA employs a layout-aware attention mask to realize point-guided cross-modal fusion for further boosting layout awareness of LANS. Extensive experiments on datasets Geometry3K and PGPS9K validate the effectiveness of the layout-aware modules and superior problem-solving performance of our LANS solver, over existing symbolic and neural solvers. We have made our code and data publicly available.
Search
Fix author
Co-authors
- Cheng-Lin Liu 4
- Fei Yin 3
- Yeyun Gong 2
- Xiao Liang (梁霄) 2
- Peijie Wang 2
- Duzhen Zhang 2
- Ming-Liang Zhang 2
- Jinfeng Bai 1
- Lei Bai 1
- Baolong Bi 1
- Pi Bu 1
- Kai-Wei Chang 1
- Weizhu Chen 1
- Chao Deng 1
- Weiwei Deng 1
- Jiahua Dong 1
- Yutao Fan 1
- Junfeng Fang 1
- Yuan Gao 1
- Hejia Geng 1
- Zhijiang Guo 1
- Qiang He 1
- Yufei He 1
- Bryan Hooi 1
- Haizhen Huang 1
- Lei Ji 1
- Zhilong Ji 1
- Eric Hanchen Jiang 1
- Chenxing Li 1
- Xiao-Hui Li 1
- Xinfeng Li 1
- Zhenghao Lin 1
- Xiao Liu 1
- Yue Liu 1
- Yingwei MA 1
- Yashar Moshfeghi 1
- Yong Ren 1
- Yelong Shen 1
- Jun Song 1
- Zelin Tan 1
- Zihao Tang 1
- Philip Torr 1
- Xing W 1
- Guancheng Wan 1
- Cheng Wang 1
- Ying Nian Wu 1
- Haotian Xu 1
- Jian Xu 1
- Xiangyuan Xue 1
- Zhenfei Yin 1
- Shengju Yu 1
- Xiaohang Yu 1
- Yahan Yu 1
- Jiale Yuan 1
- Chen Zhang 1
- Guibin Zhang 1
- Hengyuan Zhang 1
- Jiaheng Zhang 1
- Jiaxin Zhang 1
- Mulei Zhang 1
- Zaibin Zhang 1
- Bo Zheng 1
- Heng Zhou 1
- Yifan Zhou 1