Abstract
This paper describes the University of Ljubljana (UL FRI) Group’s submissions to the shared task at the Balto-Slavic Natural Language Processing (BSNLP) 2021 Workshop. We experiment with multiple BERT-based models, pre-trained in multi-lingual, Croatian-Slovene-English and Slovene-only data. We perform training iteratively and on the concatenated data of previously available NER datasets. For the normalization task we use Stanza lemmatizer, while for entity matching we implemented a baseline using the Dedupe library. The performance of evaluations suggests that multi-source settings outperform less-resourced approaches. The best NER models achieve 0.91 F-score on Slovene training data splits while the best official submission achieved F-scores of 0.84 and 0.78 for relaxed partial matching and strict settings, respectively. In multi-lingual NER setting we achieve F-scores of 0.82 and 0.74.- Anthology ID:
- 2021.bsnlp-1.9
- Volume:
- Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing
- Month:
- April
- Year:
- 2021
- Address:
- Kiyv, Ukraine
- Venue:
- BSNLP
- SIG:
- SIGSLAV
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 80–85
- Language:
- URL:
- https://aclanthology.org/2021.bsnlp-1.9
- DOI:
- Cite (ACL):
- Marko Prelevikj and Slavko Zitnik. 2021. Multilingual Named Entity Recognition and Matching Using BERT and Dedupe for Slavic Languages. In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 80–85, Kiyv, Ukraine. Association for Computational Linguistics.
- Cite (Informal):
- Multilingual Named Entity Recognition and Matching Using BERT and Dedupe for Slavic Languages (Prelevikj & Zitnik, BSNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.bsnlp-1.9.pdf