2015
pdf
bib
abs
Combining Minimally-supervised Methods for Arabic Named Entity Recognition
Maha Althobaiti
|
Udo Kruschwitz
|
Massimo Poesio
Transactions of the Association for Computational Linguistics, Volume 3
Supervised methods can achieve high performance on NLP tasks, such as Named Entity Recognition (NER), but new annotations are required for every new domain and/or genre change. This has motivated research in minimally supervised methods such as semi-supervised learning and distant learning, but neither technique has yet achieved performance levels comparable to those of supervised methods. Semi-supervised methods tend to have very high precision but comparatively low recall, whereas distant learning tends to achieve higher recall but lower precision. This complementarity suggests that better results may be obtained by combining the two types of minimally supervised methods. In this paper we present a novel approach to Arabic NER using a combination of semi-supervised and distant learning techniques. We trained a semi-supervised NER classifier and another one using distant learning techniques, and then combined them using a variety of classifier combination schemes, including the Bayesian Classifier Combination (BCC) procedure recently proposed for sentiment analysis. According to our results, the BCC model leads to an increase in performance of 8 percentage points over the best base classifiers.
2014
pdf
bib
abs
AraNLP: a Java-based Library for the Processing of Arabic Text.
Maha Althobaiti
|
Udo Kruschwitz
|
Massimo Poesio
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We present a free, Java-based library named “AraNLP” that covers various Arabic text preprocessing tools. Although a good number of tools for processing Arabic text already exist, integration and compatibility problems continually occur. AraNLP is an attempt to gather most of the vital Arabic text preprocessing tools into one library that can be accessed easily by integrating or accurately adapting existing tools and by developing new ones when required. The library includes a sentence detector, tokenizer, light stemmer, root stemmer, part-of speech tagger (POS-tagger), word segmenter, normalizer, and a punctuation and diacritic remover.
pdf
bib
Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia
Maha Althobaiti
|
Udo Kruschwitz
|
Massimo Poesio
Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics
2013
pdf
bib
A Semi-supervised Learning Approach to Arabic Named Entity Recognition
Maha Althobaiti
|
Udo Kruschwitz
|
Massimo Poesio
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013