JATE 2.0: Java Automatic Term Extraction with Apache Solr

Ziqi Zhang, Jie Gao, Fabio Ciravegna


Abstract
Automatic Term Extraction (ATE) or Recognition (ATR) is a fundamental processing step preceding many complex knowledge engineering tasks. However, few methods have been implemented as public tools and in particular, available as open-source freeware. Further, little effort is made to develop an adaptable and scalable framework that enables customization, development, and comparison of algorithms under a uniform environment. This paper introduces JATE 2.0, a complete remake of the free Java Automatic Term Extraction Toolkit (Zhang et al., 2008) delivering new features including: (1) highly modular, adaptable and scalable ATE thanks to integration with Apache Solr, the open source free-text indexing and search platform; (2) an extended collection of state-of-the-art algorithms. We carry out experiments on two well-known benchmarking datasets and compare the algorithms along the dimensions of effectiveness (precision) and efficiency (speed and memory consumption). To the best of our knowledge, this is by far the only free ATE library offering a flexible architecture and the most comprehensive collection of algorithms.
Anthology ID:
L16-1359
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2262–2269
Language:
URL:
https://aclanthology.org/L16-1359
DOI:
Bibkey:
Cite (ACL):
Ziqi Zhang, Jie Gao, and Fabio Ciravegna. 2016. JATE 2.0: Java Automatic Term Extraction with Apache Solr. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2262–2269, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
JATE 2.0: Java Automatic Term Extraction with Apache Solr (Zhang et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/L16-1359.pdf
Code
 ziqizhang/jate