Mingkuan Feng
2026
Two-Stage Regularization-Based Structured Pruning for LLMs
Mingkuan Feng | Jinyang Wu | Siyuan Liu | Shuai Zhang | Hongjian Fang | Ruihan Jin | Feihu Che | Pengpeng Shao | Zhengqi Wen | Jianhua Tao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mingkuan Feng | Jinyang Wu | Siyuan Liu | Shuai Zhang | Hongjian Fang | Ruihan Jin | Feihu Che | Pengpeng Shao | Zhengqi Wen | Jianhua Tao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The deployment of large language models (LLMs) is largely hindered by their large number of parameters. Structural pruning has emerged as a promising solution. Prior structured pruning methods directly remove unimportant parameters based on certain metrics, which often causes knowledge loss and necessitates extensive retraining. To overcome this, we introduce a novel pruning method **TRSP**: **T**wo-Stage **R**egularization-Based **S**tructured **P**runing for LLMs. Specifically, we multiply the output of each transformer layer by an initial learnable weight and iteratively learn these weights by adding their ℓ1-norm as a regularization term to the loss function, serving as the first-stage regularization. Subsequently, we apply additional regularization to the difference between the output and input of layers with smaller weights, encouraging the shift of knowledge to the preserved layers. This serves as the second-stage regularization. TRSP retains more knowledge and better preserves model performance than direct parameter elimination. Through extensive experimentation we show that TRSP outperforms strong layer-wise structured pruning methods without requiring retraining. As a layer-wise pruning method, it delivers notable end-to-end acceleration, making it a promising solution for efficient LLM deployment.
TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
Jinyang Wu | Chonghua Liao | Mingkuan Feng | Shuai Zhang | Zhengqi Wen | Haoran Luo | Ling Yang | Huazhe Xu | Jianhua Tao
Findings of the Association for Computational Linguistics: ACL 2026
Jinyang Wu | Chonghua Liao | Mingkuan Feng | Shuai Zhang | Zhengqi Wen | Haoran Luo | Ling Yang | Huazhe Xu | Jianhua Tao
Findings of the Association for Computational Linguistics: ACL 2026
Reinforcement learning (RL) has emerged as an effective paradigm for enhancing model reasoning. However, existing RL methods like GRPO often rely on unstructured self-sampling to fit scalar rewards, often producing inefficient rollouts that fail to capture transferable problem-solving strategies. To address these limitations, we propose **TemplateRL**, a structured template-guided RL framework that augments policy optimization with explicit template guidance. Our approach first constructs a problem-solving template library via MCTS on a small seed set, then seamlessly integrates this high-level structured guidance into RL training. By guiding rollout generation to align with proven template structures, TemplateRL significantly improves high-quality trajectory hit rates while reducing ineffective exploration. This structure-guided design steers the policy toward validated strategic patterns, stabilizing training dynamics, and enhancing RL sampling efficiency. Notably, the explicit template library is interpretable, editable, and supports online updates-enabling continuous updates during both training and inference. Extensive experiments demonstrate that TemplateRL outperforms GRPO by 99% on AIME and 41% on AMC, with superior stability on weak models and remarkable cross-domain generalization, highlighting its potential for broader tasks.
Beyond Examples: Towards Automated Thought-level In-Context Reasoning for Large Language Models
Jinyang Wu | Mingkuan Feng | Shuai Zhang | Feihu Che | Zhengqi Wen | Chonghua Liao | Ling Yang | Haoran Luo | Zheng Lian | Jianhua Tao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jinyang Wu | Mingkuan Feng | Shuai Zhang | Feihu Che | Zhengqi Wen | Chonghua Liao | Ling Yang | Haoran Luo | Zheng Lian | Jianhua Tao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In-context learning (ICL) leverages demonstrations to enhance the performance of large language models (LLMs). However, traditional ICL struggles with complex reasoning mainly due to superficial, example-level implicit imitation. To address these limitations, we introduce **ThoughtICR**, an automated **Thought**-level **I**n-**C**ontext **R**easoning paradigm that shifts from surface-level examples to more guidance-oriented thought patterns. Specifically, we first define atomic reasoning actions and construct thought patterns on small-scale seed data using Monte Carlo Tree Search (MCTS). During inference, we dynamically select appropriate thought patterns based on target problem attributes, providing explicit guidance for model reasoning. Thanks to its automated and strategic design, our method enables seamless plug-and-play integration with various post-training techniques. Experimental results demonstrate that our method improves performance across different model sizes and generalizes effectively across reasoning domains. Using only small-scale seed data, we achieve 80.6% accuracy on MATH and 62.5% on AMC, surpassing GPT-4o’s 77.2% and 57.5%, respectively. Moreover, compared to test-time scaling methods, our approach reduces computational costs by over 10. Our code is available at https://github.com/jinyangwu/ThoughtICR.
2025
RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing
Ruihan Jin | Pengpeng Shao | Zhengqi Wen | Jinyang Wu | Mingkuan Feng | Shuai Zhang | Jianhua Tao
Findings of the Association for Computational Linguistics: EMNLP 2025
Ruihan Jin | Pengpeng Shao | Zhengqi Wen | Jinyang Wu | Mingkuan Feng | Shuai Zhang | Jianhua Tao
Findings of the Association for Computational Linguistics: EMNLP 2025
The rapid advancements in large language models (LLMs) have led to the emergence of routing techniques, which aim to efficiently select the optimal LLM from diverse candidates to tackle specific tasks, optimizing performance while reducing costs. Current LLM routing methods are limited in effectiveness due to insufficient exploration of the intrinsic connection between user queries and the characteristics of LLMs. To address this issue, in this paper, we present **RadialRouter**, a novel framework for LLM routing which employs a lightweight Transformer-based backbone with a radial structure named **RadialFormer** to articulate the query-LLMs relationship. The optimal LLM selection is performed based on the final states of RadialFormer. The pipeline is further refined by an objective function that combines Kullback-Leibler divergence with the query-query contrastive loss to enhance robustness. Experimental results on RouterBench show that RadialRouter significantly outperforms existing routing methods by 9.2% and 5.8% in the *Balance* and *Cost First* scenarios, respectively. Additionally, its adaptability toward different performance-cost trade-offs and the dynamic LLM pool demonstrates practical application potential.
Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models
Jinyang Wu | Shuai Zhang | Feihu Che | Mingkuan Feng | Pengpeng Shao | Jianhua Tao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jinyang Wu | Shuai Zhang | Feihu Che | Mingkuan Feng | Pengpeng Shao | Jianhua Tao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Retrieval-Augmented Generation (RAG) has emerged as a crucial method for addressing hallucinations in large language models (LLMs). While recent research has extended RAG models to complex noisy scenarios, these explorations often confine themselves to limited noise types and presuppose that noise is inherently detrimental to LLMs, potentially deviating from real-world retrieval environments and restricting practical applicability. In this paper, we define seven distinct noise types from a linguistic perspective and establish a Noise RAG Benchmark (NoiserBench), a comprehensive evaluation framework encompassing multiple datasets and reasoning tasks. Through empirical evaluation of eight representative LLMs with diverse architectures and scales, we reveal that these noises can be further categorized into two practical groups: noise that is beneficial to LLMs (aka beneficial noise) and noise that is harmful to LLMs (aka harmful noise). While harmful noise generally impairs performance, beneficial noise may enhance several aspects of model capabilities and overall performance. Our analysis offers insights for developing robust RAG solutions and mitigating hallucinations across diverse retrieval scenarios. Code is available at https://github.com/jinyangwu/NoiserBench.