2017
pdf
abs
Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach
Fahad Albogamy
|
Allan Ramsay
|
Hanady Ahmed
Proceedings of the Third Arabic Natural Language Processing Workshop
In this paper, we propose using a “bootstrapping” method for constructing a dependency treebank of Arabic tweets. This method uses a rule-based parser to create a small treebank of one thousand Arabic tweets and a data-driven parser to create a larger treebank by using the small treebank as a seed training set. We are able to create a dependency treebank from unlabelled tweets without any manual intervention. Experiments results show that this method can improve the speed of training the parser and the accuracy of the resulting parsers.
pdf
abs
Universal Dependencies for Arabic Tweets
Fahad Albogamy
|
Allan Ramsay
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
To facilitate cross-lingual studies, there is an increasing interest in identifying linguistic universals. Recently, a new universal scheme was designed as a part of universal dependency project. In this paper, we map the Arabic tweets dependency treebank (ATDT) to the Universal Dependency (UD) scheme to compare it to other language resources and for the purpose of cross-lingual studies.
2016
pdf
abs
Unsupervised Stemmer for Arabic Tweets
Fahad Albogamy
|
Allan Ramsay
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Stemming is an essential processing step in a wide range of high level text processing applications such as information extraction, machine translation and sentiment analysis. It is used to reduce words to their stems. Many stemming algorithms have been developed for Modern Standard Arabic (MSA). Although Arabic tweets and MSA are closely related and share many characteristics, there are substantial differences between them in lexicon and syntax. In this paper, we introduce a light Arabic stemmer for Arabic tweets. Our results show improvements over the performance of a number of well-known stemmers for Arabic.
pdf
abs
Fast and Robust POS tagger for Arabic Tweets Using Agreement-based Bootstrapping
Fahad Albogamy
|
Allan Ramsay
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Part-of-Speech(POS) tagging is a key step in many NLP algorithms. However, tweets are difficult to POS tag because they are short, are not always written maintaining formal grammar and proper spelling, and abbreviations are often used to overcome their restricted lengths. Arabic tweets also show a further range of linguistic phenomena such as usage of different dialects, romanised Arabic and borrowing foreign words. In this paper, we present an evaluation and a detailed error analysis of state-of-the-art POS taggers for Arabic when applied to Arabic tweets. On the basis of this analysis, we combine normalisation and external knowledge to handle the domain noisiness and exploit bootstrapping to construct extra training data in order to improve POS tagging for Arabic tweets. Our results show significant improvements over the performance of a number of well-known taggers for Arabic.
2015
pdf
bib
POS Tagging for Arabic Tweets
Fahad Albogamy
|
Allan Ramsay
Proceedings of the International Conference Recent Advances in Natural Language Processing
pdf
Towards POS Tagging for Arabic Tweets
Fahad Albogamy
|
Allan Ramasy
Proceedings of the Workshop on Noisy User-generated Text