Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task

Marek Suppa; Ondrej Jariabka

Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task

Abstract

In this paper we describe TraSpaS, a submission to the third shared task on named entity recognition hosted as part of the Balto-Slavic Natural Language Processing (BSNLP) Workshop. In it we evaluate various pre-trained language models on the NER task using three open-source NLP toolkits: character level language model with Stanza, language-specific BERT-style models with SpaCy and Adapter-enabled XLM-R with Trankit. Our results show that the Trankit-based models outperformed those based on the other two toolkits, even when trained on smaller amounts of data. Our code is available at https://github.com/NaiveNeuron/slavner-2021.

Anthology ID:: 2021.bsnlp-1.13
Volume:: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing
Month:: April
Year:: 2021
Address:: Kiyv, Ukraine
Venue:: BSNLP
SIG:: SIGSLAV
Publisher:: Association for Computational Linguistics
Note:
Pages:: 105–114
Language:
URL:: https://aclanthology.org/2021.bsnlp-1.13
DOI:
Bibkey:
Cite (ACL):: Marek Suppa and Ondrej Jariabka. 2021. Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task. In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 105–114, Kiyv, Ukraine. Association for Computational Linguistics.
Cite (Informal):: Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task (Suppa & Jariabka, BSNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/auto-file-uploads/2021.bsnlp-1.13.pdf
Code: naiveneuron/slavner-2021

PDF Search Code