Marek Suppa


2021

pdf
Cost-effective Deployment of BERT Models in Serverless Environment
Marek Suppa | Katarína Benešová | Andrej Švec
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

In this study, we demonstrate the viability of deploying BERT-style models to AWS Lambda in a production environment. Since the freely available pre-trained models are too large to be deployed in this environment, we utilize knowledge distillation and fine-tune the models on proprietary datasets for two real-world tasks: sentiment analysis and semantic textual similarity. As a result, we obtain models that are tuned for a specific domain and deployable in the serverless environment. The subsequent performance analysis shows that this solution does not only report latency levels acceptable for production use but that it is also a cost-effective alternative to small-to-medium size deployments of BERT models, all without any infrastructure overhead.

pdf
Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task
Marek Suppa | Ondrej Jariabka
Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing

In this paper we describe TraSpaS, a submission to the third shared task on named entity recognition hosted as part of the Balto-Slavic Natural Language Processing (BSNLP) Workshop. In it we evaluate various pre-trained language models on the NER task using three open-source NLP toolkits: character level language model with Stanza, language-specific BERT-style models with SpaCy and Adapter-enabled XLM-R with Trankit. Our results show that the Trankit-based models outperformed those based on the other two toolkits, even when trained on smaller amounts of data. Our code is available at https://github.com/NaiveNeuron/slavner-2021.

2020

pdf
A Summarization Dataset of Slovak News Articles
Marek Suppa | Jergus Adamec
Proceedings of the Twelfth Language Resources and Evaluation Conference

As a well established NLP task, single-document summarization has seen significant interest in the past few years. However, most of the work has been done on English datasets. This is particularly noticeable in the context of evaluation where the dominant ROUGE metric assumes its input to be written in English. In this paper we aim to address both of these issues by introducing a summarization dataset of articles from a popular Slovak news site and proposing small adaptation to the ROUGE metric that make it better suited for Slovak texts. Several baselines are evaluated on the dataset, including an extractive approach based on the Multilingual version of the BERT architecture. To the best of our knowledge, the presented dataset is the first large-scale news-based summarization dataset for text written in Slovak language. It can be reproduced using the utilities available at https://github.com/NaiveNeuron/sme-sum