Exploring the Impact of Transliteration on NLP Performance for Low-Resource Languages: The Case of Maltese and Arabic
Kurt Micallef, Fadhl Eryani, Nizar Habash, Houda Bouamor, Claudia Borg
Abstract
Maltese is a low-resource language of Arabic and Romance origins written in Latin script. We explore the impact of transliterating Maltese into Arabic script on a number of downstream tasks. We compare multiple transliteration pipelines ranging from simple one-to-one character maps to more sophisticated alternatives that explore multiple possibilities or make use of manual linguistic annotations. We show that the sophisticated systems are consistently better than simpler systems, quantitatively and qualitatively. We also show transliterating Maltese can be considered as an option to improve the cross-lingual transfer capabilities.- Anthology ID:
- 2023.cawl-1.4
- Volume:
- Proceedings of the Workshop on Computation and Written Language (CAWL 2023)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Venue:
- CAWL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 22–32
- Language:
- URL:
- https://aclanthology.org/2023.cawl-1.4
- DOI:
- Cite (ACL):
- Kurt Micallef, Fadhl Eryani, Nizar Habash, Houda Bouamor, and Claudia Borg. 2023. Exploring the Impact of Transliteration on NLP Performance for Low-Resource Languages: The Case of Maltese and Arabic. In Proceedings of the Workshop on Computation and Written Language (CAWL 2023), pages 22–32, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Exploring the Impact of Transliteration on NLP Performance for Low-Resource Languages: The Case of Maltese and Arabic (Micallef et al., CAWL 2023)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2023.cawl-1.4.pdf