Abstract
This paper presents an end-to-end automatic processing system for Arabic. The system performs: correction of common spelling errors pertaining to different forms of alef, ta marbouta and ha, and alef maqsoura and ya; context sensitive word segmentation into underlying clitics, POS tagging, and gender and number tagging of nouns and adjectives. We introduce the use of stem templates as a feature to improve POS tagging by 0.5% and to help ascertain the gender and number of nouns and adjectives. For gender and number tagging, we report accuracies that are significantly higher on previously unseen words compared to a state-of-the-art system.- Anthology ID:
- L14-1296
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 2926–2931
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/335_Paper.pdf
- DOI:
- Cite (ACL):
- Kareem Darwish, Ahmed Abdelali, and Hamdy Mubarak. 2014. Using Stem-Templates to Improve Arabic POS and Gender/Number Tagging. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2926–2931, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Using Stem-Templates to Improve Arabic POS and Gender/Number Tagging (Darwish et al., LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/335_Paper.pdf