Ghassen Karray


2025

pdf bib
Assessing Reliability and Political Bias In LLMs’ Judgements of Formal and Material Inferences With Partisan Conclusions
Reto Gubelmann | Ghassen Karray
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This article examines LLMs’ ability to correctly label simple inferences with partisan conclusions. For this, we develop a dataset with both formal and material inferences, containing logically equivalent pairs of inferences with conclusions that favor either the political left or the political right. This allows us to focus on political bias as a source of decrease in performance. Our samples are synthetically generated and thus highly controlled, covering both English and German. We assess the performance of 16 configurations of both open and proprietary state-of-the-art LLMs on that dataset, finding generally unreliable performance as well as widespread political bias which, in the case of the English samples, persists throughout our experimental settings.