Self-Preference: An Automated Method for Preference-Aligned Data Constructed from Business Metrics

Feng Gao; Xuan Zhang; Boyi Ni; Chunping Wang; Lei Chen

Self-Preference: An Automated Method for Preference-Aligned Data Constructed from Business Metrics

Feng Gao, Xuan Zhang, Boyi Ni, Chunping Wang, Lei Chen

Abstract

"Large language models (LLMs) have become integral components of various AI solutions, with the reinforcement learning from human feedback (RLHF) stage playing a critical role in align-ing model outputs with human preferences. However, generating the human preference data required for RLHF is often costly and time-consuming due to its reliance on human evaluation.This study addresses this challenge within the dialogue scenarios of the fintech industry. We leverage rich, non-confidential, multi-turn dialogue data, such as call center dialogue records,which include associated business metrics (e.g., problem-solving rates, turnover ratios) to con-struct preference-aligned data. We introduce Self-Preference, an automated method for creating preference-aligned data guided by these objective business metrics. The approach involves clustering dialogue histories based on their semantic representations and calculating a well-designed conditional probability ratio that correlates sequences with business metrics to generate preference data. In contrast to traditional preference alignment data generation methods that depend on subjective human evaluations, Self-Preference significantly reduces labeling costs and mitigates model-induced biases. Experimental results indicate that models trained with Self-Preference generated data demonstrate a strong positive correlation with target business metrics, highlight-ing the method’s effectiveness in facilitating efficient, goal-oriented alignment of LLMs."

Anthology ID:: 2025.ccl-1.66
Volume:: Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Month:: August
Year:: 2025
Address:: Jinan, China
Editors:: Maosong Sun, Peiyong Duan, Zhiyuan Liu, Ruifeng Xu, Weiwei Sun
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 864–879
Language:
URL:: https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.66/
DOI:
Bibkey:
Cite (ACL):: Feng Gao, Xuan Zhang, Boyi Ni, Chunping Wang, and Lei Chen. 2025. Self-Preference: An Automated Method for Preference-Aligned Data Constructed from Business Metrics. In Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025), pages 864–879, Jinan, China. Chinese Information Processing Society of China.
Cite (Informal):: Self-Preference: An Automated Method for Preference-Aligned Data Constructed from Business Metrics (Gao et al., CCL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.66.pdf

PDF Cite Search Fix data