Using Stem-Templates to Improve Arabic POS and Gender/Number Tagging

Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak


Abstract
This paper presents an end-to-end automatic processing system for Arabic. The system performs: correction of common spelling errors pertaining to different forms of alef, ta marbouta and ha, and alef maqsoura and ya; context sensitive word segmentation into underlying clitics, POS tagging, and gender and number tagging of nouns and adjectives. We introduce the use of stem templates as a feature to improve POS tagging by 0.5% and to help ascertain the gender and number of nouns and adjectives. For gender and number tagging, we report accuracies that are significantly higher on previously unseen words compared to a state-of-the-art system.
Anthology ID:
L14-1296
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2926–2931
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/335_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Kareem Darwish, Ahmed Abdelali, and Hamdy Mubarak. 2014. Using Stem-Templates to Improve Arabic POS and Gender/Number Tagging. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2926–2931, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Using Stem-Templates to Improve Arabic POS and Gender/Number Tagging (Darwish et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/335_Paper.pdf