Alan Davoust

2026

Uncovering Ideological Bias in RAG with Lexical Multidimensional Analysis: A Case Study on COVID-19
Elmira Salari | Maria Claudia Nunes Delfino | Hazem Amamou | José Victor de Souza | Shruti Kshirsagar | Alan Davoust | Anderson Avila
Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)

This paper studies the impact of retrieved ideologically framed texts on the outputs of large language models (LLMs). While interest in understanding ideological framing in LLMs has recently increased, little attention has been given to this issue in the context of Retrieval-Augmented Generation (RAG). To fill this gap, we design an external knowledge source based on ideologically framed texts about COVID-19 treatments. Our corpus is based on 1,117 academic articles representing discourses about controversial and endorsed treatments for the disease. We propose a corpus linguistics framework, based on Lexical Multidimensional Analysis (LMDA), to identify discourse dimensions within the corpus. LLMs are tasked to answer questions derived from three identified discourse dimensions, and two types of contextual prompts are adopted: the first comprises the user question and ideologically framed texts; and the second contains the question, ideologically framed texts, and LMDA descriptions. Alignment between reference ideologically framed texts and LLMs’ responses is assessed using cosine similarity for lexical and semantic representations. Results demonstrate that retrieved ideologically framed texts influence LLM responses toward the discourse framing represented in the external knowledge, with enhanced prompts further amplifying this effect. Our findings highlight the importance of identifying ideological framings within the RAG framework in order to mitigate not just unintended ideological bias, but also the risks of intentional discourse steering of such models.

Co-authors

José Victor de Souza 1

Venues

*SEM1
WS1

Fix author