Anirudh Goyal

2025

pdf bib abs
A Systematic Examination of Preference Learning through the Lens of Instruction-Following
Joongwon Kim | Anirudh Goyal | Aston Zhang | Bo Xiong | Rui Hou | Melanie Kambadur | Dhruv Mahajan | Hannaneh Hajishirzi | Liang Tan
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

In this work we systematically investigate how specific attributes of preference datasets affect the alignment and downstream performance of LLMs in instruction-following tasks. We use a novel synthetic data generation pipeline to generate 48,000 unique instruction-following prompts with combinations of 23 verifiable constraints that enable fine-grained and automated quality assessments of model responses. With our synthetic prompts, we use rejection sampling (RS) and Monte Carlo Tree Search (MCTS) to obtain preference pairs. Then, we perform experiments investigating the effects of (1) the presence of shared prefixes between the chosen and rejected responses, (2) the contrast and quality of the chosen, rejected responses and (3) the complexity of the training prompts. Our experiments reveal that shared prefixes provide marginal but consistent improvements and greater stability across challenging training configurations. While high-contrast preference pairs generally outperform low-contrast pairs, combining both often yields the best performance. Additionally, training on prompts of moderate difficulty leads to better generalization across different tasks. Our findings provide actionable insights into optimizing preference data curation for instruction-following tasks, offering a scalable and effective framework for enhancing LLM training and alignment.

2024

Co-authors

Venues

emnlp1
naacl1

Fix data