Abstract
Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations. In this work, we study the performance, extensibility, and interaction of two such adaptations: vocabulary augmentation and script transliteration. Our evaluations on part-of-speech tagging, universal dependency parsing, and named entity recognition in nine diverse low-resource languages uphold the viability of these approaches while raising new questions around how to optimally adapt multilingual models to low-resource settings.- Anthology ID:
- 2021.mrl-1.5
- Original:
- 2021.mrl-1.5v1
- Version 2:
- 2021.mrl-1.5v2
- Volume:
- Proceedings of the 1st Workshop on Multilingual Representation Learning
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Venue:
- MRL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 51–61
- Language:
- URL:
- https://aclanthology.org/2021.mrl-1.5
- DOI:
- 10.18653/v1/2021.mrl-1.5
- Cite (ACL):
- Ethan C. Chau and Noah A. Smith. 2021. Specializing Multilingual Language Models: An Empirical Study. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 51–61, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Specializing Multilingual Language Models: An Empirical Study (Chau & Smith, MRL 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.mrl-1.5.pdf
- Code
- ethch18/specializing-multilingual