Simin Guo
2025
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
Zhengze Zhang
|
Shiqi Wang
|
Yiqun Shen
|
Simin Guo
|
Dahua Lin
|
Xiaoliang Wang
|
Nguyen Cam-Tu
|
Fei Tan
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) have demonstrated exceptional performance across various applications, but their conversational abilities decline sharply as model size decreases, presenting a barrier to their deployment in resource-constrained environments. Knowledge distillation (KD) with Direct Preference Optimization (DPO) has emerged as a promising approach to enhance the conversational abilities of smaller models using a larger teacher model. However, current methods primarily focus on “black-box” KD, which only uses the teacher’s responses, overlooking the rich distributional information within the teacher’s probability distribution. This paper addresses this gap by introducing daDPO (Distillation-Aware DPO), a novel framework that integrates the teacher’s distributional information into DPO distillation while preserving theoretical guarantees. Our framework offers a unified objective that enhances both preference optimization and distribution-based distillation. We provide rigorous theoretical analysis and empirical validation, showing that daDPO outperforms existing methods in restoring performance for pruned models and enhancing smaller models within the same LLM family. Notably, in in-domain evaluation, our method enables a 20% pruned Vicuna1.5-7B to achieve near-teacher performance (-7.3% preference rate), and allows Qwen2.5-1.5B to occasionally outperform its 7b teacher model (14.0% win rate).
Search
Fix author
Co-authors
- Nguyen Cam-Tu 1
- Dahua Lin 1
- Yiqun Shen 1
- Fei Tan 1
- Shiqi Wang 1
- show all...