Aleš Žagar


2022

pdf
Slovene SuperGLUE Benchmark: Translation and Evaluation
Aleš Žagar | Marko Robnik-Šikonja
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present SuperGLUE benchmark adapted and translated into Slovene using a combination of human and machine translation. We describe the translation process and problems arising due to differences in morphology and grammar. We evaluate the translated datasets in several modes: monolingual, cross-lingual, and multilingual, taking into account differences between machine and human translated training sets. The results show that the monolingual Slovene SloBERTa model is superior to massively multilingual and trilingual BERT models, but these also show a good cross-lingual performance on certain tasks. The performance of Slovene models still lags behind the best English models.

2021

pdf
Unsupervised Approach to Multilingual User Comments Summarization
Aleš Žagar | Marko Robnik-Šikonja
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

User commenting is a valuable feature of many news outlets, enabling them a contact with readers and enabling readers to express their opinion, provide different viewpoints, and even complementary information. Yet, large volumes of user comments are hard to filter, let alone read and extract relevant information. The research on the summarization of user comments is still in its infancy, and human-created summarization datasets are scarce, especially for less-resourced languages. To address this issue, we propose an unsupervised approach to user comments summarization, which uses a modern multilingual representation of sentences together with standard extractive summarization techniques. Our comparison of different sentence representation approaches coupled with different summarization approaches shows that the most successful combinations are the same in news and comment summarization. The empirical results and presented visualisation show usefulness of the proposed methodology for several languages.