Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

Go Inoue; Salam Khalifa; Nizar Habash

doi:10.18653/v1/2022.findings-acl.135

Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

Abstract

We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models. Our models consistently outperform existing systems in Modern Standard Arabic and all the Arabic dialects we study, achieving 2.6% absolute improvement over the previous state-of-the-art in Modern Standard Arabic, 2.8% in Gulf, 1.6% in Egyptian, and 8.3% in Levantine. We explore different training setups for fine-tuning pre-trained transformer language models, including training data size, the use of external linguistic resources, and the use of annotated data from other dialects in a low-resource scenario. Our results show that strategic fine-tuning using datasets from other high-resource dialects is beneficial for a low-resource dialect. Additionally, we show that high-quality morphological analyzers as external linguistic resources are beneficial especially in low-resource settings.

Anthology ID:: 2022.findings-acl.135
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1708–1719
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.findings-acl.135/
DOI:: 10.18653/v1/2022.findings-acl.135
Bibkey:
Cite (ACL):: Go Inoue, Salam Khalifa, and Nizar Habash. 2022. Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1708–1719, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects (Inoue et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.findings-acl.135.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.findings-acl.135.mp4
Code: camel-lab/camelbert_morphosyntactic_tagger

PDF Cite Search Code Video Fix data