Noor Abo Mokh
2022
Improving POS Tagging for Arabic Dialects on Out-of-Domain Texts
Noor Abo Mokh
|
Daniel Dakota
|
Sandra Kübler
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
We investigate part of speech tagging for four Arabic dialects (Gulf, Levantine, Egyptian, and Maghrebi), in an out-of-domain setting. More specifically, we look at the effectiveness of 1) upsampling the target dialect in the training data of a joint model, 2) increasing the consistency of the annotations, and 3) using word embeddings pre-trained on a large corpus of dialectal Arabic. We increase the accuracy on average by about 20 percentage points.
Search