Abstract
In this paper, we propose using a “bootstrapping” method for constructing a dependency treebank of Arabic tweets. This method uses a rule-based parser to create a small treebank of one thousand Arabic tweets and a data-driven parser to create a larger treebank by using the small treebank as a seed training set. We are able to create a dependency treebank from unlabelled tweets without any manual intervention. Experiments results show that this method can improve the speed of training the parser and the accuracy of the resulting parsers.- Anthology ID:
- W17-1312
- Volume:
- Proceedings of the Third Arabic Natural Language Processing Workshop
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Venue:
- WANLP
- SIG:
- SEMITIC
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 94–99
- Language:
- URL:
- https://aclanthology.org/W17-1312
- DOI:
- 10.18653/v1/W17-1312
- Cite (ACL):
- Fahad Albogamy, Allan Ramsay, and Hanady Ahmed. 2017. Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 94–99, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach (Albogamy et al., WANLP 2017)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W17-1312.pdf
- Data
- Penn Treebank