Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

Luis-Adrián Cabrera-Diego; José G. Moreno; Antoine Doucet

Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

Luis Adrián Cabrera-Diego, Jose G. Moreno, Antoine Doucet

Abstract

We present a collection of Named Entity Recognition (NER) systems for six Slavic languages: Bulgarian, Czech, Polish, Slovenian, Russian and Ukrainian. These NER systems have been trained using different BERT models and a Frustratingly Easy Domain Adaptation (FEDA). FEDA allow us creating NER systems using multiple datasets without having to worry about whether the tagset (e.g. Location, Event, Miscellaneous, Time) in the source and target domains match, while increasing the amount of data available for training. Moreover, we boosted the prediction on named entities by marking uppercase words and predicting masked words. Participating in the 3rd Shared Task on SlavNER, our NER systems reached a strict match micro F-score of up to 0.908. The results demonstrate good generalization, even in named entities with weak regularity, such as book titles, or entities that were never seen during the training.

Anthology ID:: 2021.bsnlp-1.12
Volume:: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing
Month:: April
Year:: 2021
Address:: Kiyv, Ukraine
Editors:: Bogdan Babych, Olga Kanishcheva, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Vasyl Starko, Josef Steinberger, Roman Yangarber, Michał Marcińczuk, Senja Pollak, Pavel Přibáň, Marko Robnik-Šikonja
Venue:: BSNLP
SIG:: SIGSLAV
Publisher:: Association for Computational Linguistics
Note:
Pages:: 98–104
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2021.bsnlp-1.12/
DOI:
Bibkey:
Cite (ACL):: Luis Adrián Cabrera-Diego, Jose G. Moreno, and Antoine Doucet. 2021. Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems. In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 98–104, Kiyv, Ukraine. Association for Computational Linguistics.
Cite (Informal):: Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems (Cabrera-Diego et al., BSNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2021.bsnlp-1.12.pdf
Code: embeddia/ner_feda

PDF Cite Search Code Fix data