JRC TMA-CC: Slavic Named Entity Recognition and Linking. Participation in the BSNLP-2019 shared task

Guillaume Jacquet, Jakub Piskorski, Hristo Tanev, Ralf Steinberger


Abstract
We report on the participation of the JRC Text Mining and Analysis Competence Centre (TMA-CC) in the BSNLP-2019 Shared Task, which focuses on named-entity recognition, lemmatisation and cross-lingual linking. We propose a hybrid system combining a rule-based approach and light ML techniques. We use multilingual lexical resources such as JRC-NAMES and BABELNET together with a named entity guesser to recognise names. In a second step, we combine known names with wild cards to increase recognition recall by also capturing inflection variants. In a third step, we increase precision by filtering these name candidates with automatically learnt inflection patterns derived from name occurrences in large news article collections. Our major requirement is to achieve high precision. We achieved an average of 65% F-measure with 93% precision on the four languages.
Anthology ID:
W19-3714
Volume:
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Tomaž Erjavec, Michał Marcińczuk, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:
BSNLP
SIG:
SIGSLAV
Publisher:
Association for Computational Linguistics
Note:
Pages:
100–104
Language:
URL:
https://aclanthology.org/W19-3714
DOI:
10.18653/v1/W19-3714
Bibkey:
Cite (ACL):
Guillaume Jacquet, Jakub Piskorski, Hristo Tanev, and Ralf Steinberger. 2019. JRC TMA-CC: Slavic Named Entity Recognition and Linking. Participation in the BSNLP-2019 shared task. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 100–104, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
JRC TMA-CC: Slavic Named Entity Recognition and Linking. Participation in the BSNLP-2019 shared task (Jacquet et al., BSNLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/W19-3714.pdf