Albert Aristotle Nanda

2025

pdf bib abs
DPL: Diverse Preference Learning Without A Reference Model
Abhijnan Nath | Andrey Volozin | Saumajit Saha | Albert Aristotle Nanda | Galina Grunin | Rahul Bhotika | Nikhil Krishnaswamy
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

In direct preference alignment in LLMs, most existing methods seek to retrieve the reward function directly from preference data. However, real-world preference data often contains diversity in preference annotations reflective of true human preferences. Existing algorithms, including KTO, do not directly utilize such nuances in the annotations which limits their applicability. In this work, we propose Diverse Preference Learning (DPL), a reference model-free method that simultaneously learns a baseline desirability in LLM responses while being robust to the diversity of preference annotations. Our experiments for instruction-following on Ultrafeedback and AlpacaEval 2.0 and for text-summarization on Reddit TL;DR suggest that DPL is consistently better at learning the diversity of preferences compared to existing methods, including those that require a reference model in memory. Apart from overall quality, we find that DPL’s completions, on average, are more honest, helpful, truthful and safe compared to existing methods.

Co-authors

Andrey Volozin 1

Venues

naacl1

Fix data