Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach

Fahad Albogamy, Allan Ramsay, Hanady Ahmed


Abstract
In this paper, we propose using a “bootstrapping” method for constructing a dependency treebank of Arabic tweets. This method uses a rule-based parser to create a small treebank of one thousand Arabic tweets and a data-driven parser to create a larger treebank by using the small treebank as a seed training set. We are able to create a dependency treebank from unlabelled tweets without any manual intervention. Experiments results show that this method can improve the speed of training the parser and the accuracy of the resulting parsers.
Anthology ID:
W17-1312
Volume:
Proceedings of the Third Arabic Natural Language Processing Workshop
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Nizar Habash, Mona Diab, Kareem Darwish, Wassim El-Hajj, Hend Al-Khalifa, Houda Bouamor, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
SEMITIC
Publisher:
Association for Computational Linguistics
Note:
Pages:
94–99
Language:
URL:
https://aclanthology.org/W17-1312
DOI:
10.18653/v1/W17-1312
Bibkey:
Cite (ACL):
Fahad Albogamy, Allan Ramsay, and Hanady Ahmed. 2017. Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 94–99, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach (Albogamy et al., WANLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/W17-1312.pdf