Yunke Zhang
2026
VRPO: Rethinking Value Modeling for Robust RL under Noisy Supervision in LLM Post-Training
Dingwei Zhu | Shihan Dou | Zhiheng Xi | Senjie Jin | Guoqiang Zhang | Jiazheng Zhang | Junjie Ye | Mingxu Chai | Enyu Zhou | Ming Zhang | Yuhui Wang | Caishuang Huang | Chenhao Huang | Yunke Zhang | Yuran Wang | Tao Gui | Qi Zhang | Xipeng Qiu | Xuanjing Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Dingwei Zhu | Shihan Dou | Zhiheng Xi | Senjie Jin | Guoqiang Zhang | Jiazheng Zhang | Junjie Ye | Mingxu Chai | Enyu Zhou | Ming Zhang | Yuhui Wang | Caishuang Huang | Chenhao Huang | Yunke Zhang | Yuran Wang | Tao Gui | Qi Zhang | Xipeng Qiu | Xuanjing Huang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement Learning (RL) in real-world environments often suffers from ambiguous or incomplete reward supervision, which undermines policy stability and generalization. Such noise may cause models to ignore key information or even collapse in advantage estimation. We find that a strong value model is essential for absorbing unstable signals and producing reliable advantages, offering denser and more robust supervision than the reward model. To better optimize noisy supervision, we propose VRPO, a framework that enhances value modeling for robust RL in LLM post-training. VRPO integrates (1) auxiliary losses guided by entropy and perplexity from a frozen language model, and (2) a variational information bottleneck, enabling the value model to filter noise and capture key words. This design allows the value model to correct noise rewards and generate more reliable advantage estimates, transforming it from a passive predictor into an active noise regulator. Experiments on multi-turn dialogue, math reasoning, and science QA with both rule-based and model-based rewards show that VRPO consistently outperforms baselines such as PPO and GRPO. Our work highlight the central role of the value model in Robust RL and provide a principled and practical approach to policy optimization under noisy supervision.
2025
TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation
Daiye Miao | Yufang Liu | Jie Wang | Changzhi Sun | Yunke Zhang | Demei Yan | Shaokang Dong | Qi Zhang | Yuanbin Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Daiye Miao | Yufang Liu | Jie Wang | Changzhi Sun | Yunke Zhang | Demei Yan | Shaokang Dong | Qi Zhang | Yuanbin Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
LoRA has become one of the most widely used parameter-efficient fine-tuning methods due to its simplicity and effectiveness. However, numerous studies have shown that LoRA often introduces substantial parameter redundancy, which not only increases the number of trainable parameters but also hinders the effectiveness of fine-tuning. Since identifying redundant parameters in LoRA is inherently difficult, how to eliminate them efficiently and accurately remains a challenging problem. In this paper, we propose TASO, a redundancy reduction method that leverages importance information from the pretrained model’s weights to mitigate LoRA redundancy. Specifically, we estimate parameter importance on downstream tasks and identify task-specific core regions based on the distribution of importance scores. The location information of these core regions is then used to determine the sparse structure of LoRA modules, enabling redundancy removal before fine-tuning. Our approach significantly reduces the number of trainable parameters required for task adaptation, while providing a novel task-aligned perspective for LoRA redundancy reduction. Experimental results demonstrate that, with a parameter budget comparable to LoRA with rank r = 1, TASO consistently outperforms standard LoRA across multiple tasks, achieving strong fine-tuning performance while effectively eliminating redundant parameters.
Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition
Yuming Yang | Wantong Zhao | Caishuang Huang | Junjie Ye | Xiao Wang | Huiyuan Zheng | Yang Nan | Yuran Wang | Xueying Xu | Kaixin Huang | Yunke Zhang | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 31st International Conference on Computational Linguistics
Yuming Yang | Wantong Zhao | Caishuang Huang | Junjie Ye | Xiao Wang | Huiyuan Zheng | Yang Nan | Yuran Wang | Xueying Xu | Kaixin Huang | Yunke Zhang | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 31st International Conference on Computational Linguistics
Open Named Entity Recognition (NER), which involves identifying arbitrary types of entities from arbitrary domains, remains challenging for Large Language Models (LLMs). Recent studies suggest that fine-tuning LLMs on extensive NER data can boost their performance. However, training directly on existing datasets neglects their inconsistent entity definitions and redundant data, limiting LLMs to dataset-specific learning and hindering out-of-domain adaptation. To address this, we present B2NERD, a compact dataset designed to guide LLMs’ generalization in Open NER under a universal entity taxonomy. B2NERD is refined from 54 existing English and Chinese datasets using a two-step process. First, we detect inconsistent entity definitions across datasets and clarify them by distinguishable label names to construct a universal taxonomy of 400+ entity types. Second, we address redundancy using a data pruning strategy that selects fewer samples with greater category and semantic diversity. Comprehensive evaluation shows that B2NERD significantly enhances LLMs’ Open NER capabilities. Our B2NER models, trained on B2NERD, outperform GPT-4 by 6.8-12.0 F1 points and surpass previous methods in 3 out-of-domain benchmarks across 15 datasets and 6 languages. The data, models, and code are publicly available at https://github.com/UmeanNever/B2NER.
2021
HiTRANS: A Hierarchical Transformer Network for Nested Named Entity Recognition
Zhiwei Yang | Jing Ma | Hechang Chen | Yunke Zhang | Yi Chang
Findings of the Association for Computational Linguistics: EMNLP 2021
Zhiwei Yang | Jing Ma | Hechang Chen | Yunke Zhang | Yi Chang
Findings of the Association for Computational Linguistics: EMNLP 2021
Nested Named Entity Recognition (NNER) has been extensively studied, aiming to identify all nested entities from potential spans (i.e., one or more continuous tokens). However, recent studies for NNER either focus on tedious tagging schemas or utilize complex structures, which fail to learn effective span representations from the input sentence with highly nested entities. Intuitively, explicit span representations will contribute to NNER due to the rich context information they contain. In this study, we propose a Hierarchical Transformer (HiTRANS) network for the NNER task, which decomposes the input sentence into multi-grained spans and enhances the representation learning in a hierarchical manner. Specifically, we first utilize a two-phase module to generate span representations by aggregating context information based on a bottom-up and top-down transformer network. Then a label prediction layer is designed to recognize nested entities hierarchically, which naturally explores semantic dependencies among different spans. Experiments on GENIA, ACE-2004, ACE-2005 and NNE datasets demonstrate that our proposed method achieves much better performance than the state-of-the-art approaches.
Search
Fix author
Co-authors
- Tao Gui 2
- Caishuang Huang 2
- Xuan-Jing Huang (黄萱菁) 2
- Yuran Wang 2
- Junjie Ye (叶俊杰) 2
- Qi Zhang 2
- Mingxu Chai 1
- Yi Chang 1
- Hechang Chen 1
- Shaokang Dong 1
- Shihan Dou 1
- Chenhao Huang 1
- Kaixin Huang 1
- Senjie Jin 1
- Yufang Liu 1
- Jing Ma 1
- Daiye Miao 1
- Yang Nan 1
- Xipeng Qiu (邱锡鹏) 1
- Changzhi Sun 1
- Jie Wang 1
- Xiao Wang 1
- Yuhui Wang 1
- Yuanbin Wu 1
- Zhiheng Xi 1
- Xueying Xu 1
- Demei Yan 1
- Yuming Yang 1
- Zhiwei Yang 1
- Guoqiang Zhang 1
- Jiazheng Zhang 1
- Ming Zhang 1
- Qi Zhang 1
- Wantong Zhao 1
- Huiyuan Zheng 1
- Enyu Zhou 1
- Dingwei Zhu 1