Love Börjeson


2023

pdf
Superlim: A Swedish Language Understanding Evaluation Benchmark
Aleksandrs Berdicevskis | Gerlof Bouma | Robin Kurtz | Felix Morger | Joey Öhman | Yvonne Adesam | Lars Borin | Dana Dannélls | Markus Forsberg | Tim Isbister | Anna Lindahl | Martin Malmsten | Faton Rekathati | Magnus Sahlgren | Elena Volodina | Love Börjeson | Simon Hengchen | Nina Tahmasebi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We present Superlim, a multi-task NLP benchmark and analysis platform for evaluating Swedish language models, a counterpart to the English-language (Super)GLUE suite. We describe the dataset, the tasks, the leaderboard and report the baseline results yielded by a reference implementation. The tested models do not approach ceiling performance on any of the tasks, which suggests that Superlim is truly difficult, a desirable quality for a benchmark. We address methodological challenges, such as mitigating the Anglocentric bias when creating datasets for a less-resourced language; choosing the most appropriate measures; documenting the datasets and making the leaderboard convenient and transparent. We also highlight other potential usages of the dataset, such as, for instance, the evaluation of cross-lingual transfer learning.

2021

pdf
It’s Basically the Same Language Anyway: the Case for a Nordic Language Model
Magnus Sahlgren | Fredrik Carlsson | Fredrik Olsson | Love Börjeson
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

When is it beneficial for a research community to organize a broader collaborative effort on a topic, and when should we instead promote individual efforts? In this opinion piece, we argue that we are at a stage in the development of large-scale language models where a collaborative effort is desirable, despite the fact that the preconditions for making individual contributions have never been better. We consider a number of arguments for collaboratively developing a large-scale Nordic language model, include environmental considerations, cost, data availability, language typology, cultural similarity, and transparency. Our primary goal is to raise awareness and foster a discussion about our potential impact and responsibility as NLP community.