Specializing Multilingual Language Models: An Empirical Study

Ethan C. Chau, Noah A. Smith


Abstract
Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations. In this work, we study the performance, extensibility, and interaction of two such adaptations: vocabulary augmentation and script transliteration. Our evaluations on part-of-speech tagging, universal dependency parsing, and named entity recognition in nine diverse low-resource languages uphold the viability of these approaches while raising new questions around how to optimally adapt multilingual models to low-resource settings.
Anthology ID:
2021.mrl-1.5
Original:
2021.mrl-1.5v1
Version 2:
2021.mrl-1.5v2
Volume:
Proceedings of the 1st Workshop on Multilingual Representation Learning
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
MRL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
51–61
Language:
URL:
https://aclanthology.org/2021.mrl-1.5
DOI:
10.18653/v1/2021.mrl-1.5
Bibkey:
Cite (ACL):
Ethan C. Chau and Noah A. Smith. 2021. Specializing Multilingual Language Models: An Empirical Study. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 51–61, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Specializing Multilingual Language Models: An Empirical Study (Chau & Smith, MRL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.mrl-1.5.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2021.mrl-1.5.mp4
Code
 ethch18/specializing-multilingual