Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind

Hongchuan Zeng, Hongshen Xu, Lu Chen, Kai Yu


Abstract
Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality. Although numerous model compression techniques have been investigated, they typically rely on a calibration set that overlooks the multilingual context and results in significant accuracy degradation for low-resource languages. This paper introduces Multilingual Brain Surgeon (MBS), a novel calibration data sampling method for multilingual LLMs compression. MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets. Our experiments, conducted on the BLOOM multilingual LLM, demonstrate that MBS improves the performance of existing English-centric compression methods, especially for low-resource languages. We also uncover the dynamics of language interaction during compression, revealing that the larger the proportion of a language in the training set and the more similar the language is to the calibration language, the better performance the language retains after compression. In conclusion, MBS presents an innovative approach to compressing multilingual LLMs, addressing the performance disparities and improving the language inclusivity of existing compression techniques. Keywords: Large Language Model, Multilingual Model Compression
Anthology ID:
2024.lrec-main.1030
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
11794–11812
Language:
URL:
https://aclanthology.org/2024.lrec-main.1030
DOI:
Bibkey:
Cite (ACL):
Hongchuan Zeng, Hongshen Xu, Lu Chen, and Kai Yu. 2024. Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11794–11812, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind (Zeng et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.1030.pdf