Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning
Sen Yang, Leyang Cui, Deng Cai, Xinting Huang, Shuming Shi, Wai Lam
Abstract
Iterative preference learning, though yielding superior performances, requires online annotated preference labels. In this work, we study strategies to save annotation budgets while achieving competitive or even better performances for iterative preference learning. Built on intuitions from active learning, we empirically show that annotating those response pairs with small margins is generally better than large or random. Besides, experiments under the multi-iteration scenario suggest allocating more annotation budgets in the earlier iterations rather than later ones.- Anthology ID:
- 2024.findings-emnlp.382
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6549–6561
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.382/
- DOI:
- 10.18653/v1/2024.findings-emnlp.382
- Cite (ACL):
- Sen Yang, Leyang Cui, Deng Cai, Xinting Huang, Shuming Shi, and Wai Lam. 2024. Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6549–6561, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning (Yang et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.382.pdf