Jie Hu
2026
Multi-Granularity Semantic Revision for Large Language Model Distillation
Xiaoyu Liu | Yun Zhang | Wei Li | Simiao Li | Xudong Huang | Hanting Chen | Yehui Tang | Jie Hu | Zhiwei Xiong | Yunhe Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiaoyu Liu | Yun Zhang | Wei Li | Simiao Li | Xudong Huang | Hanting Chen | Yehui Tang | Jie Hu | Zhiwei Xiong | Yunhe Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Knowledge distillation is crucial for compressing Large Language Models (LLMs), enabling smaller student models to learn from larger teacher models. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, existing distillation loss functions struggle to align the most informative part due to the complex output distributions of LLMs. To address these problems, we propose a multi-granularity semantic revision method for LLM distillation. At the sequence level, we propose a sequence correction and re-generation (SCRG) strategy. SCRG identifies error tokens by calculating the semantic cognitive difference between teacher and student outputs, corrects them using teacher-generated tokens, and re-generates the sequence to minimize errors. At the token level, we design a distribution adaptive clipping Kullback-Leibler (DAC-KL) loss, which uses a learnable sub-network to focus on semantically dense areas of the teacher’s output, reducing the impact of redundant information. At the span level, we utilize span priors to compute probability correlations within sequences, ensuring consistency between teacher and student outputs to enhance semantic information transfer. Extensive experiments on models ranging from 0.1B to 13B parameters demonstrate the effectiveness of our approach compared to existing methods.
2025
EpiCoDe: Boosting Model Performance Beyond Training with Extrapolation and Contrastive Decoding
Mingxu Tao | Jie Hu | Mingchuan Yang | Yunhuai Liu | Dongyan Zhao | Yansong Feng
Findings of the Association for Computational Linguistics: ACL 2025
Mingxu Tao | Jie Hu | Mingchuan Yang | Yunhuai Liu | Dongyan Zhao | Yansong Feng
Findings of the Association for Computational Linguistics: ACL 2025
The remarkable performance of Large language models (LLMs) relies heavily on the availability of abundant high-quality training data. However, the high cost of acquiring annotated data often prevents models from obtaining capabilities to tackle downstream tasks. In this paper, we introduce a novel method, EpiCoDe that boosts model performance in data-scarcity scenarios without extra training. We first employ model extrapolation to enhance a finetuned model with its inferior version, and then adopt contrastive decoding to further reduce predicted errors, by comparing the logit scores given by the extrapolated and the vanilla finetuned model. Experiments across three domains over four different LLMs show that EpiCoDe consistently outperforms existing methods with significant and robust improvement. We also propose a new theoretical framework to reveal the mechanism behind contrastive decoding in data-scarcity scenarios, which further helps better understand the effectiveness of our EpiCoDe.
Adversarial Preference Learning for Robust LLM Alignment
Yuanfu Wang | Pengyu Wang | Chenyang Xi | Bo Tang | Junyi Zhu | Wenqiang Wei | Chen Chen | Chao Yang | Jingfeng Zhang | Chaochao Lu | Yijun Niu | Keming Mao | Zhiyu Li | Feiyu Xiong | Jie Hu | Mingchuan Yang
Findings of the Association for Computational Linguistics: ACL 2025
Yuanfu Wang | Pengyu Wang | Chenyang Xi | Bo Tang | Junyi Zhu | Wenqiang Wei | Chen Chen | Chao Yang | Jingfeng Zhang | Chaochao Lu | Yijun Niu | Keming Mao | Zhiyu Li | Feiyu Xiong | Jie Hu | Mingchuan Yang
Findings of the Association for Computational Linguistics: ACL 2025
Modern language models often rely on Reinforcement Learning from Human Feedback (RLHF) to encourage safe behaviors. However, they remain vulnerable to adversarial attacks due to three key limitations: (1) the inefficiency and high cost of human annotation, (2) the vast diversity of potential adversarial attacks, and (3) the risk of feedback bias and reward hacking. To address these challenges, we introduce Adversarial Preference Learning (APL), an iterative adversarial training method incorporating three key innovations. First, a direct harmfulness metric based on the model’s intrinsic preference probabilities, eliminating reliance on external assessment. Second, a conditional generative attacker that synthesizes input-specific adversarial variations. Third, an iterative framework with automated closed-loop feedback, enabling continuous adaptation through vulnerability discovery and mitigation. Experiments on Mistral-7B-Instruct-v0.3 demonstrate that APL significantly enhances robustness, achieving 83.33% harmlessness win rate over the base model (evaluated by GPT-4o), reducing harmful outputs from 5.88% to 0.43% (measured by LLaMA-Guard), and lowering attack success rate by up to 65% according to HarmBench. Notably, APL maintains competitive utility, with an MT-Bench score of 6.59 (comparable to the baseline 6.78) and an LC-WinRate of 46.52% against the base model.
CARE-STaR: Constraint-aware Self-taught Reasoner
Zhiliang Li | Bo Tang | Yijun Niu | Beihong Jin | Qiwen Shi | Yuchen Feng | Zhiyu Li | Jie Hu | Mingchuan Yang | Feiyu Xiong
Findings of the Association for Computational Linguistics: ACL 2025
Zhiliang Li | Bo Tang | Yijun Niu | Beihong Jin | Qiwen Shi | Yuchen Feng | Zhiyu Li | Jie Hu | Mingchuan Yang | Feiyu Xiong
Findings of the Association for Computational Linguistics: ACL 2025
In real-world applications, large language models (LLMs) often need to handle diverse and complex instructions. Specifically, when instructions are subject to multiple constraints, some of which are somewhat ambiguous, LLMs often fail to produce answers that satisfy all constraints, limiting their effectiveness in various tasks. To address this challenge, we examine the different constraints in the instructions and discover that the difficulty of determining whether an answer meets a constraint varies widely, from extremely straightforward to exceptionally perplexing. Correspondingly, we propose to assign constraints to different constraint levels. Furthermore, inspired by chain-of-thought (CoT) and self-taught reasoner (STaR), we propose a two-stage method named CARE-STaR (Constraint-AwaRE STaR). Our method distinguishes constraints within instructions by generating different CoTs and guides LLMs to autonomously learn optimal answers by setting the positive rewards to the CoTs that are beneficial to generating accurate answers and iteratively optimizing these answers. We have conducted extensive experiments on three instruction-following benchmarks, taking three existing LLMs as base LLMs, respectively. Experimental results indicate that our method substantially enhances the capability of these LLMs to handle complex instructions, outperforming supervised fine-tuning (SFT). Our code is available at https://github.com/lzl0124/carestar.
Search
Fix author
Co-authors
- Mingchuan Yang 3
- Zhiyu Li 2
- Yijun Niu 2
- Bo Tang 2
- Feiyu Xiong 2
- Chen Chen 1
- Hanting Chen 1
- Yansong Feng 1
- Yuchen Feng 1
- Xudong Huang 1
- Beihong Jin 1
- Simiao Li 1
- Wei Li 1
- Zhiliang Li 1
- Xiaoyu Liu 1
- Yunhuai Liu 1
- Chaochao Lu 1
- Keming Mao 1
- Qiwen Shi 1
- Yehui Tang 1
- Mingxu Tao 1
- Pengyu Wang 1
- Yuanfu Wang 1
- Yunhe Wang 1
- Wenqiang Wei 1
- Chenyang Xi 1
- Zhiwei Xiong 1
- Chao Yang 1
- Jingfeng Zhang 1
- Yun Zhang 1
- Dongyan Zhao 1
- Junyi Zhu 1