Liye Zhao


2026

Large Language Model (LLM) personalization aims to align model behaviors with individual user preferences.Existing methods often focus on isolated user histories, neglecting the essential role of inter-user differences.We propose C-BPO, a framework that personalizes LLMs via preference-calibrated binary signals.By treating target user data as positive feedback and other users’ data as an auxiliary set of implicit negative signals, C-BPO captures distinct inter-user differences.To mitigate the preference overlap issue, where shared task knowledge is erroneously penalized, we derive an objective grounded in Positive-Unlabeled (PU) learning theory.This approach purifies negative signals by subtracting “positive bias”, ensuring alignment with unique idiosyncrasies without compromising general helpfulness.Empirical experiments across various personalization tasks and backbone LLMs show C-BPO consistently outperforms baselines, demonstrating the efficacy of preference-calibrated binary signals in modeling inter-user differences.