RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker
Abstract
Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on a small set of high-resource languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art research transfer to a multilingual setting. In this work, we perform an exhaustive study to achieve a new state of the art in aligning multilingual LLMs. We introduce a novel, scalable method for generating high-quality multilingual feedback data to balance data coverage. We establish the benefits of cross-lingual transfer and increased dataset size in preference training. Our preference-trained model achieves a 54.4% win-rate against Aya 23 8B, the current state-of-the-art multilingual LLM in its parameter class, and a 69.5% win-rate or higher against widely used models like Gemma, Mistral and Llama 3. As a result of our efforts, we expand the frontier of alignment techniques to 23 languages, covering approximately half of the world’s population.- Anthology ID:
- 2024.emnlp-main.729
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 13134–13156
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.emnlp-main.729/
- DOI:
- 10.18653/v1/2024.emnlp-main.729
- Cite (ACL):
- John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, and Sara Hooker. 2024. RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 13134–13156, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs (Dang et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.emnlp-main.729.pdf