Martin Tamajka


2022

pdf
SlovakBERT: Slovak Masked Language Model
Matúš Pikuliak | Štefan Grivalský | Martin Konôpka | Miroslav Blšták | Martin Tamajka | Viktor Bachratý | Marian Simko | Pavol Balážik | Michal Trnka | Filip Uhlárik
Findings of the Association for Computational Linguistics: EMNLP 2022

We introduce a new Slovak masked language model called SlovakBERT. This is to our best knowledge the first paper discussing Slovak transformers-based language models. We evaluate our model on several NLP tasks and achieve state-of-the-art results. This evaluation is likewise the first attempt to establish a benchmark for Slovak language models. We publish the masked language model, as well as the fine-tuned models for part-of-speech tagging, sentiment analysis and semantic textual similarity.