JRC TMA-CC: Slavic Named Entity Recognition and Linking. Participation in the BSNLP-2019 shared task
Guillaume Jacquet, Jakub Piskorski, Hristo Tanev, Ralf Steinberger
Abstract
We report on the participation of the JRC Text Mining and Analysis Competence Centre (TMA-CC) in the BSNLP-2019 Shared Task, which focuses on named-entity recognition, lemmatisation and cross-lingual linking. We propose a hybrid system combining a rule-based approach and light ML techniques. We use multilingual lexical resources such as JRC-NAMES and BABELNET together with a named entity guesser to recognise names. In a second step, we combine known names with wild cards to increase recognition recall by also capturing inflection variants. In a third step, we increase precision by filtering these name candidates with automatically learnt inflection patterns derived from name occurrences in large news article collections. Our major requirement is to achieve high precision. We achieved an average of 65% F-measure with 93% precision on the four languages.- Anthology ID:
- W19-3714
- Volume:
- Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Tomaž Erjavec, Michał Marcińczuk, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
- Venue:
- BSNLP
- SIG:
- SIGSLAV
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 100–104
- Language:
- URL:
- https://aclanthology.org/W19-3714
- DOI:
- 10.18653/v1/W19-3714
- Cite (ACL):
- Guillaume Jacquet, Jakub Piskorski, Hristo Tanev, and Ralf Steinberger. 2019. JRC TMA-CC: Slavic Named Entity Recognition and Linking. Participation in the BSNLP-2019 shared task. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 100–104, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- JRC TMA-CC: Slavic Named Entity Recognition and Linking. Participation in the BSNLP-2019 shared task (Jacquet et al., BSNLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/W19-3714.pdf