Exploring the Impact of Transliteration on NLP Performance for Low-Resource Languages: The Case of Maltese and Arabic

Kurt Micallef, Fadhl Eryani, Nizar Habash, Houda Bouamor, Claudia Borg


Abstract
Maltese is a low-resource language of Arabic and Romance origins written in Latin script. We explore the impact of transliterating Maltese into Arabic script on a number of downstream tasks. We compare multiple transliteration pipelines ranging from simple one-to-one character maps to more sophisticated alternatives that explore multiple possibilities or make use of manual linguistic annotations. We show that the sophisticated systems are consistently better than simpler systems, quantitatively and qualitatively. We also show transliterating Maltese can be considered as an option to improve the cross-lingual transfer capabilities.
Anthology ID:
2023.cawl-1.4
Volume:
Proceedings of the Workshop on Computation and Written Language (CAWL 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Venue:
CAWL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–32
Language:
URL:
https://aclanthology.org/2023.cawl-1.4
DOI:
Bibkey:
Cite (ACL):
Kurt Micallef, Fadhl Eryani, Nizar Habash, Houda Bouamor, and Claudia Borg. 2023. Exploring the Impact of Transliteration on NLP Performance for Low-Resource Languages: The Case of Maltese and Arabic. In Proceedings of the Workshop on Computation and Written Language (CAWL 2023), pages 22–32, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Exploring the Impact of Transliteration on NLP Performance for Low-Resource Languages: The Case of Maltese and Arabic (Micallef et al., CAWL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/2023.cawl-1.4.pdf