Arabic POS Tagging: Don’t Abandon Feature Engineering Just Yet
Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki
Abstract
This paper focuses on comparing between using Support Vector Machine based ranking (SVM-Rank) and Bidirectional Long-Short-Term-Memory (bi-LSTM) neural-network based sequence labeling in building a state-of-the-art Arabic part-of-speech tagging system. Using SVM-Rank leads to state-of-the-art results, but with a fair amount of feature engineering. Using bi-LSTM, particularly when combined with word embeddings, may lead to competitive POS-tagging results by automatically deducing latent linguistic features. However, we show that augmenting bi-LSTM sequence labeling with some of the features that we used for the SVM-Rank based tagger yields to further improvements. We also show that gains that realized by using embeddings may not be additive with the gains achieved by the features. We are open-sourcing both the SVM-Rank and the bi-LSTM based systems for free.- Anthology ID:
- W17-1316
- Volume:
- Proceedings of the Third Arabic Natural Language Processing Workshop
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Nizar Habash, Mona Diab, Kareem Darwish, Wassim El-Hajj, Hend Al-Khalifa, Houda Bouamor, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
- Venue:
- WANLP
- SIG:
- SEMITIC
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 130–137
- Language:
- URL:
- https://aclanthology.org/W17-1316
- DOI:
- 10.18653/v1/W17-1316
- Cite (ACL):
- Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, and Mohamed Eldesouki. 2017. Arabic POS Tagging: Don’t Abandon Feature Engineering Just Yet. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 130–137, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Arabic POS Tagging: Don’t Abandon Feature Engineering Just Yet (Darwish et al., WANLP 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/W17-1316.pdf