Abstract
In this paper we describe TraSpaS, a submission to the third shared task on named entity recognition hosted as part of the Balto-Slavic Natural Language Processing (BSNLP) Workshop. In it we evaluate various pre-trained language models on the NER task using three open-source NLP toolkits: character level language model with Stanza, language-specific BERT-style models with SpaCy and Adapter-enabled XLM-R with Trankit. Our results show that the Trankit-based models outperformed those based on the other two toolkits, even when trained on smaller amounts of data. Our code is available at https://github.com/NaiveNeuron/slavner-2021.- Anthology ID:
- 2021.bsnlp-1.13
- Volume:
- Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing
- Month:
- April
- Year:
- 2021
- Address:
- Kiyv, Ukraine
- Editors:
- Bogdan Babych, Olga Kanishcheva, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Vasyl Starko, Josef Steinberger, Roman Yangarber, Michał Marcińczuk, Senja Pollak, Pavel Přibáň, Marko Robnik-Šikonja
- Venue:
- BSNLP
- SIG:
- SIGSLAV
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 105–114
- Language:
- URL:
- https://aclanthology.org/2021.bsnlp-1.13
- DOI:
- Cite (ACL):
- Marek Suppa and Ondrej Jariabka. 2021. Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task. In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, pages 105–114, Kiyv, Ukraine. Association for Computational Linguistics.
- Cite (Informal):
- Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task (Suppa & Jariabka, BSNLP 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.bsnlp-1.13.pdf
- Code
- naiveneuron/slavner-2021