Avneet Kaur

2025

pdf bib abs
Echoes of Agreement: Argument Driven Sycophancy in Large Language models
Avneet Kaur
Findings of the Association for Computational Linguistics: EMNLP 2025

Existing evaluation of political biases in Large Language Models (LLMs) outline the high sensitivity to prompt formulation. Furthermore, Large Language Models are known to exhibit sycophancy, a tendency to align their outputs with a user’s stated belief, which is often attributed to human feedback during fine-tuning. However, such bias in the presence of explicit argumentation within a prompt remains underexplored. This paper investigates how argumentative prompts induce sycophantic behaviour in LLMs in a political context. Through a series of experiments, we demonstrate that models consistently alter their responses to mirror the stance present expressed by the user. This sycophantic behaviour is observed in both single and multi-turn interactions, and its intensity correlates with argument strength. Our findings establish a link between user stance and model sycophancy, revealing a critical vulnerability that impacts model reliability. Thus has significant implications for models being deployed in real-world settings and calls for developing robust evaluations and mitigations against manipulative or biased interactions.

Co-authors

Venues

findings1

Fix author