Leif Azzopardi

2025

pdf bib abs
Evaluation of Attribution Bias in Generator-Aware Retrieval-Augmented Large Language Models
Amin Abolghasemi | Leif Azzopardi | Seyyed Hadi Hashemi | Maarten de Rijke | Suzan Verberne
Findings of the Association for Computational Linguistics: ACL 2025

Attributing answers to source documents is an approach used to enhance the verifiability of a model’s output in retrieval-augmented generation (RAG). Prior work has mainly focused on improving and evaluating the attribution quality of large language models (LLMs) in RAG, but this may come at the expense of inducing biases in the attribution of answers. We define and examine two aspects in the evaluation of LLMs in RAG pipelines, namely attribution sensitivity and bias with respect to authorship information. We explicitly inform an LLM about the authors of source documents, instruct it to attribute its answers, and analyze (i) how sensitive the LLM’s output is to the author of source documents, and (ii) whether the LLM exhibits a bias towards human-written or AI-generated source documents. We design an experimental setup in which we use counterfactual evaluation to study three LLMs in terms of their attribution sensitivity and bias in RAG pipelines. Our results show that adding authorship information to source documents can significantly change the attribution quality of LLMs by 3 to 18%. We show that LLMs can have an attribution bias towards explicit human authorship, which can serve as a competing hypothesis for findings of prior work that shows that LLM-generated content may be preferred over human-written contents. Our findings indicate that metadata of source documents can influence LLMs’ trust, and how they attribute their answers. Furthermore, our research highlights attribution bias and sensitivity as a novel aspect of the vulnerability of LLMs.

pdf bib abs
POW: Political Overton Windows of Large Language Models
Leif Azzopardi | Yashar Moshfeghi
Findings of the Association for Computational Linguistics: EMNLP 2025

Political bias in Large Language Models (LLMs) presents a growing concern for the responsible deployment of AI systems. Traditional audits often attempt to locate a model’s political position as a point estimate, masking the broader set of ideological boundaries that shape what a model is willing or unwilling to say. In this paper, we draw upon the concept of the Overton Window as a framework for mapping these boundaries: the range of political views that a given LLM will espouse, remain neutral on, or refuse to endorse. To uncover these windows, we applied an auditing-based methodology, called PRISM, that probes LLMs through task-driven prompts designed to elicit political stances indirectly. Using the Political Compass Test, we evaluated twenty-eight LLMs from eight providers to reveal their distinct Overton Windows. While many models default to economically left and socially liberal positions, we show that their willingness to express or reject certain positions varies considerably, where DeepSeek models tend to be very restrictive in what they will discuss and Gemini models tend to be most expansive. Our findings demonstrate that Overton Windows offer a richer, more nuanced view of political bias in LLMs and provide a new lens for auditing their normative boundaries.

Leif Azzopardi

2025

2007

Co-authors

Venues