Permutative Preference Alignment from Listwise Ranking of Human Judgments

Yang Zhao; Yixin Wang; Mingzhang Yin

Permutative Preference Alignment from Listwise Ranking of Human Judgments

Abstract

Aligning Large Language Models (LLMs) with human preferences is crucial in ensuring desirable and controllable model behaviors. Current methods, such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), rely on the Bradley-Terry (B-T) model to maximize the likelihood of pairwise choices. However, when multiple responses are available, the B-T model fails to guarantee an accurate list ranking of the responses. To address this issue, we propose Permutative Preference Alignment (PPA), a novel offline listwise approach that incorporates the Normalized Discounted Cumulative Gain (NDCG)—a widely-used ranking metric—as an alternative training objective for LLM alignment. We develop an end-to-end alignment algorithm by approximating NDCG with a differentiable surrogate loss. Experiments demonstrate that PPA outperforms existing pairwise and listwise methods on evaluation sets and general benchmarks such as AlpacaEval. Furthermore, we show that NDCG-based approaches improve ranking accuracy more effectively than B-T-based methods and provide a theoretical explanation for this improvement.

Anthology ID:: 2025.emnlp-main.17
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 310–334
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.17/
DOI:
Bibkey:
Cite (ACL):: Yang Zhao, Yixin Wang, and Mingzhang Yin. 2025. Permutative Preference Alignment from Listwise Ranking of Human Judgments. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 310–334, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Permutative Preference Alignment from Listwise Ranking of Human Judgments (Zhao et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.17.pdf
Checklist:: 2025.emnlp-main.17.checklist.pdf

PDF Cite Search Checklist Fix data