Liye Zhao

2026

Personalizing LLMs with Binary Feedback: A Preference-Calibrated Optimization Framework
Xilai Ma | Liye Zhao | Weijun Yao | Haibing Di | Wenya Wang | Jing Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Model (LLM) personalization aims to align model behaviors with individual user preferences.Existing methods often focus on isolated user histories, neglecting the essential role of inter-user differences.We propose C-BPO, a framework that personalizes LLMs via preference-calibrated binary signals.By treating target user data as positive feedback and other users’ data as an auxiliary set of implicit negative signals, C-BPO captures distinct inter-user differences.To mitigate the preference overlap issue, where shared task knowledge is erroneously penalized, we derive an objective grounded in Positive-Unlabeled (PU) learning theory.This approach purifies negative signals by subtracting “positive bias”, ensuring alignment with unique idiosyncrasies without compromising general helpfulness.Empirical experiments across various personalization tasks and backbone LLMs show C-BPO consistently outperforms baselines, demonstrating the efficacy of preference-calibrated binary signals in modeling inter-user differences.

Co-authors

Venues

ACL1

Fix author