AraNLP: a Java-based Library for the Processing of Arabic Text.

Maha Althobaiti, Udo Kruschwitz, Massimo Poesio


Abstract
We present a free, Java-based library named “AraNLP” that covers various Arabic text preprocessing tools. Although a good number of tools for processing Arabic text already exist, integration and compatibility problems continually occur. AraNLP is an attempt to gather most of the vital Arabic text preprocessing tools into one library that can be accessed easily by integrating or accurately adapting existing tools and by developing new ones when required. The library includes a sentence detector, tokenizer, light stemmer, root stemmer, part-of speech tagger (POS-tagger), word segmenter, normalizer, and a punctuation and diacritic remover.
Anthology ID:
L14-1498
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4134–4138
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/621_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Maha Althobaiti, Udo Kruschwitz, and Massimo Poesio. 2014. AraNLP: a Java-based Library for the Processing of Arabic Text.. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 4134–4138, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
AraNLP: a Java-based Library for the Processing of Arabic Text. (Althobaiti et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/621_Paper.pdf