Damar Hoogland
2026
Exploring Social Bias in Slovenia: The EEC-SL Dataset
Jaya Caporusso | Damar Hoogland | Boshko Koloski | Matthew Purver | Senja Pollak | Spela Vintar
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Jaya Caporusso | Damar Hoogland | Boshko Koloski | Matthew Purver | Senja Pollak | Spela Vintar
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We introduce the EEC-SL dataset, an adaptation of the Equity Evaluation Corpus from English to Slovenian. Based on 11 sentence templates, the dataset contains 8,640 sentences, including pairs of minimally-distant sentences, varying with regard to one of two variables: gender (female or male), and ethnicity (Slovenian or not-Slovenian). In order to validate our selection of personal names, we create a localised version of the Implicit Association Test for ethnic bias, in which participants show a significant implicit bias favouring Slovenian over non-Slovenian names. We use the dataset to evaluate social bias in three computational language models (large language models and an encoder-only transformer) to perform sentiment analysis—specifically, valence. We analyse the results in terms of differences in sentiment between minimally-distant groups of sentences and inferential tests. We found limited evidence for social bias with regard to ethnicity, and no evidence for gender bias, in any of the employed models.
2024
A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media
Jaya Caporusso | Damar Hoogland | Mojca Brglez | Boshko Koloski | Matthew Purver | Senja Pollak
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Jaya Caporusso | Damar Hoogland | Mojca Brglez | Boshko Koloski | Matthew Purver | Senja Pollak
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Dehumanisation involves the perception and/or treatment of a social group’s members as less than human. This phenomenon is rarely addressed with computational linguistic techniques. We adapt a recently proposed approach for English, making it easier to transfer to other languages and to evaluate, introducing a new sentiment resource, the use of zero-shot cross-lingual valence and arousal detection, and a new method for statistical significance testing. We then apply it to study attitudes to migration expressed in Slovene newspapers, to examine changes in the Slovene discourse on migration between the 2015-16 migration crisis following the war in Syria and the 2022-23 period following the war in Ukraine. We find that while this discourse became more negative and more intense over time, it is less dehumanising when specifically addressing Ukrainian migrants compared to others.