Multilingual Named Entity Recognition and Matching Using BERT and Dedupe for Slavic Languages

Marko Prelevikj; Slavko Žitnik

Multilingual Named Entity Recognition and Matching Using BERT and Dedupe for Slavic Languages

Abstract

This paper describes the University of Ljubljana (UL FRI) Group’s submissions to the shared task at the Balto-Slavic Natural Language Processing (BSNLP) 2021 Workshop. We experiment with multiple BERT-based models, pre-trained in multi-lingual, Croatian-Slovene-English and Slovene-only data. We perform training iteratively and on the concatenated data of previously available NER datasets. For the normalization task we use Stanza lemmatizer, while for entity matching we implemented a baseline using the Dedupe library. The performance of evaluations suggests that multi-source settings outperform less-resourced approaches. The best NER models achieve 0.91 F-score on Slovene training data splits while the best official submission achieved F-scores of 0.84 and 0.78 for relaxed partial matching and strict settings, respectively. In multi-lingual NER setting we achieve F-scores of 0.82 and 0.74.

Anthology ID:: 2021.bsnlp-1.9
Volume:: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing
Month:: April
Year:: 2021
Address:: Kiyv, Ukraine
Venue:: BSNLP
SIG:: SIGSLAV
Publisher:: Association for Computational Linguistics
Note:
Pages:: 80–85
Language:
URL:: https://aclanthology.org/2021.bsnlp-1.9
DOI:
Bibkey:
Cite (ACL):: Marko Prelevikj and Slavko Zitnik. 2021. Multilingual Named Entity Recognition and Matching Using BERT and Dedupe for Slavic Languages. In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 80–85, Kiyv, Ukraine. Association for Computational Linguistics.
Cite (Informal):: Multilingual Named Entity Recognition and Matching Using BERT and Dedupe for Slavic Languages (Prelevikj & Zitnik, BSNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2021.bsnlp-1.9.pdf

PDF Search