Chunping Wang
2025
Self-Preference: An Automated Method for Preference-Aligned Data Constructed from Business Metrics
Feng Gao | Xuan Zhang | Boyi Ni | Chunping Wang | Lei Chen
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Feng Gao | Xuan Zhang | Boyi Ni | Chunping Wang | Lei Chen
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"Large language models (LLMs) have become integral components of various AI solutions, with the reinforcement learning from human feedback (RLHF) stage playing a critical role in align-ing model outputs with human preferences. However, generating the human preference data required for RLHF is often costly and time-consuming due to its reliance on human evaluation.This study addresses this challenge within the dialogue scenarios of the fintech industry. We leverage rich, non-confidential, multi-turn dialogue data, such as call center dialogue records,which include associated business metrics (e.g., problem-solving rates, turnover ratios) to con-struct preference-aligned data. We introduce Self-Preference, an automated method for creating preference-aligned data guided by these objective business metrics. The approach involves clustering dialogue histories based on their semantic representations and calculating a well-designed conditional probability ratio that correlates sequences with business metrics to generate preference data. In contrast to traditional preference alignment data generation methods that depend on subjective human evaluations, Self-Preference significantly reduces labeling costs and mitigates model-induced biases. Experimental results indicate that models trained with Self-Preference generated data demonstrate a strong positive correlation with target business metrics, highlight-ing the method’s effectiveness in facilitating efficient, goal-oriented alignment of LLMs."