Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization

Meng Li; Guangda Huzhang; Haibo Zhang; Xiting Wang; Anxiang Zeng

Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization

Meng Li, Guangda Huzhang, Haibo Zhang, Xiting Wang, Anxiang Zeng

Abstract

Direct Preference Optimization (DPO) has emerged as a promising framework for aligning Large Language Models (LLMs) with human preferences by directly optimizing the log-likelihood difference between chosen and rejected responses. However, existing methods assign equal importance to all tokens in the response, while humans focus on more meaningful parts. This leads to suboptimal preference optimization, as irrelevant or noisy tokens disproportionately influence DPO loss. To address this limitation, we propose Optimal Transport-based token weighting scheme for enhancing direct Preference Optimization (OTPO). By emphasizing semantically meaningful token pairs and de-emphasizing less relevant ones, our method introduces a context-aware token weighting scheme that yields a more contrastive reward difference estimate. This adaptive weighting enhances reward stability, improves interpretability, and ensures that preference optimization focuses on meaningful differences between responses. Extensive experiments have validated OTPO’s effectiveness in improving instruction-following ability across various settings.

Anthology ID:: 2025.acl-long.1035
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21311–21334
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1035/
DOI:
Bibkey:
Cite (ACL):: Meng Li, Guangda Huzhang, Haibo Zhang, Xiting Wang, and Anxiang Zeng. 2025. Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 21311–21334, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization (Li et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1035.pdf

PDF Cite Search Fix data