Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Viet Lai; Chien Nguyen; Nghia Ngo; Thuật Nguyễn; Franck Dernoncourt; Ryan Rossi; Thien Nguyen

doi:10.18653/v1/2023.emnlp-demo.28

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Viet Lai, Chien Nguyen, Nghia Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan Rossi, Thien Nguyen

Abstract

A key technology for large language models (LLMs) involves instruction tuning that helps align the models’ responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are applied to produce the best commercial LLMs. To improve the accessibility of LLMs, various instruction-tuned open-source LLMs have also been introduced recently. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their accessibility to many other languages in the world. In addition, SFT has been used as the only approach to instruction-tune open-source LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework with created resources, fine-tuned LLMs, interaction scripts are released at https://github.com/nlp-uoregon/Okapi. A demo video to show our framework can also be found at: https://youtu.be/QFV2fkPwvi0.

Anthology ID:: 2023.emnlp-demo.28
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Yansong Feng, Els Lefever
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 318–327
Language:
URL:: https://aclanthology.org/2023.emnlp-demo.28
DOI:: 10.18653/v1/2023.emnlp-demo.28
Bibkey:
Cite (ACL):: Viet Lai, Chien Nguyen, Nghia Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan Rossi, and Thien Nguyen. 2023. Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 318–327, Singapore. Association for Computational Linguistics.
Cite (Informal):: Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback (Lai et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/2023.emnlp-demo.28.pdf

PDF Search