Multi-word Entity Classification in a Highly Multilingual Environment
Sophie Chesney, Guillaume Jacquet, Ralf Steinberger, Jakub Piskorski
Abstract
This paper describes an approach for the classification of millions of existing multi-word entities (MWEntities), such as organisation or event names, into thirteen category types, based only on the tokens they contain. In order to classify our very large in-house collection of multilingual MWEntities into an application-oriented set of entity categories, we trained and tested distantly-supervised classifiers in 43 languages based on MWEntities extracted from BabelNet. The best-performing classifier was the multi-class SVM using a TF.IDF-weighted data representation. Interestingly, one unique classifier trained on a mix of all languages consistently performed better than classifiers trained for individual languages, reaching an averaged F1-value of 88.8%. In this paper, we present the training and test data, including a human evaluation of its accuracy, describe the methods used to train the classifiers, and discuss the results.- Anthology ID:
- W17-1702
- Volume:
- Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze
- Venue:
- MWE
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 11–20
- Language:
- URL:
- https://aclanthology.org/W17-1702
- DOI:
- 10.18653/v1/W17-1702
- Cite (ACL):
- Sophie Chesney, Guillaume Jacquet, Ralf Steinberger, and Jakub Piskorski. 2017. Multi-word Entity Classification in a Highly Multilingual Environment. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pages 11–20, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Multi-word Entity Classification in a Highly Multilingual Environment (Chesney et al., MWE 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/W17-1702.pdf
- Data
- DBpedia