How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM

Shaoxiong Ji; Pinzhen Chen

How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM

Abstract

Instruction tuning a large language model with multiple languages can prepare it for multilingual downstream tasks. Nonetheless, it is yet to be determined whether having a handful of languages is sufficient, or whether the benefits increase with the inclusion of more. By fine-tuning large multilingual models on 1 to 52 languages, we present a case study on BLOOM to understand three pertinent factors affecting performance: the number of languages, language exposure, and similarity between training and test languages. Overall we found that 1) expanding language coverage in multilingual instruction tuning proves to be beneficial; 2) accuracy often significantly boots if the test language appears in the instruction mixture; 3) languages’ genetic features correlate with cross-lingual transfer more than merely the number of language but different languages benefit to various degrees.

Anthology ID:: 2025.coling-main.175
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2575–2581
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.coling-main.175/
DOI:
Bibkey:
Cite (ACL):: Shaoxiong Ji and Pinzhen Chen. 2025. How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM. In Proceedings of the 31st International Conference on Computational Linguistics, pages 2575–2581, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM (Ji & Chen, COLING 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.coling-main.175.pdf

PDF Cite Search Fix data