Uncovering Factor-Level Preference to Improve Human-Model Alignment

Juhyun Oh; Eunsu Kim; Jiseon Kim; Wenda Xu; Inha Cha; William Yang Wang; Alice Oh

doi:10.18653/v1/2025.findings-emnlp.1045

Uncovering Factor-Level Preference to Improve Human-Model Alignment

Juhyun Oh, Eunsu Kim, Jiseon Kim, Wenda Xu, Inha Cha, William Yang Wang, Alice Oh

Abstract

Large language models (LLMs) often exhibit tendencies that diverge from human preferences, such as favoring certain writing styles or producing overly verbose outputs. While crucial for improvement, identifying the factors driving these misalignments remains challenging due to existing evaluation methods’ reliance on coarse-grained comparisons and lack of explainability.To address this, we introduce PROFILE, an automated framework to uncover and measure factor-level preference alignment of humans and LLMs.Using PROFILE, we analyze preference alignment across three key tasks: summarization, instruction-following, and document-based QA. We find a significant discrepancy: while LLMs show poor factor-level alignment with human preferences when generating texts, they demonstrate strong alignment in discrimination tasks. We demonstrate how leveraging the identified generation-discrimination gap can be used to improve LLM alignment through multiple approaches, including fine-tuning with self-guidance.Our work highlights the value of factor-level analysis for identifying hidden misalignments and provides a practical framework for improving LLM-human preference alignment.

Anthology ID:: 2025.findings-emnlp.1045
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19179–19203
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1045/
DOI:: 10.18653/v1/2025.findings-emnlp.1045
Bibkey:
Cite (ACL):: Juhyun Oh, Eunsu Kim, Jiseon Kim, Wenda Xu, Inha Cha, William Yang Wang, and Alice Oh. 2025. Uncovering Factor-Level Preference to Improve Human-Model Alignment. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19179–19203, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Uncovering Factor-Level Preference to Improve Human-Model Alignment (Oh et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1045.pdf
Checklist:: 2025.findings-emnlp.1045.checklist.pdf

PDF Cite Search Checklist Fix data