The Paradox of Preference: A Study on LLM Alignment Algorithms and Data Acquisition Methods

Rishikesh Devanathan, Varun Nathan, Ayush Kumar


Abstract
This research investigates the impact of preference annotation acquisition methods on the performance of LLM alignment algorithms, including Direct Preference Optimization (DPO), Identity Preference Optimization (IPO), and Conservative DPO (cDPO), compared to Supervised Fine-Tuning (SFT) in NLP tasks. We analyze the influence of LLM and human-based preferences on algorithm performance, considering data volume and quality. Additionally, we assess DPO’s vulnerability to overfitting and IPO’s resilience against it, addressing four main research questions. Using the GAIR dataset and Zephyr-7b as the SFT model, we reveal unexpected negative outcomes. Specifically, DPO trained on LLM preferences outperforms human preferences, contrary to expectations. Moreover, there’s no correlation between preference data volume or quality and algorithm performance. Contrary to expectations, DPO shows no overfitting in both human and LLM preference datasets. Surprisingly, cDPO doesn’t fare better than DPO under flip noise. Our findings highlight the complexities of preference annotation methods and underscore the importance of scrutinizing negative results in NLP algorithm research.
Anthology ID:
2024.insights-1.16
Volume:
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Shabnam Tafreshi, Arjun Akula, João Sedoc, Aleksandr Drozd, Anna Rogers, Anna Rumshisky
Venues:
insights | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
135–147
Language:
URL:
https://aclanthology.org/2024.insights-1.16
DOI:
Bibkey:
Cite (ACL):
Rishikesh Devanathan, Varun Nathan, and Ayush Kumar. 2024. The Paradox of Preference: A Study on LLM Alignment Algorithms and Data Acquisition Methods. In Proceedings of the Fifth Workshop on Insights from Negative Results in NLP, pages 135–147, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
The Paradox of Preference: A Study on LLM Alignment Algorithms and Data Acquisition Methods (Devanathan et al., insights-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.insights-1.16.pdf