Gabriele Maraia
2026
Sounding vs. Being an Expert: Disentangling Authority, Register and Cultural Impact in Sycophantic LLMs
Gabriele Maraia | Fabio Massimo Zanzotto | Leonardo Ranaldi
Findings of the Association for Computational Linguistics: ACL 2026
Gabriele Maraia | Fabio Massimo Zanzotto | Leonardo Ranaldi
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) have been shown to exhibit sycophancy, a tendency to align with user assertions even when they conflict with facts. We frame sycophancy as a sociolinguistic phenomenon, disentangling two distinct drivers of credibility: explicit authority (credentials) and implicit authority (linguistic register). We introduce the Sycophancy Matrix, an adversarial evaluation framework that isolates these variables. Using a controlled subset of TruthfulQA, we evaluate open-weight models across English, Spanish, and Portuguese variants. Our findings reveal that models often conflate high register with truthfulness: for some architectures, sophisticated tone triggers deference more effectively than explicit expertise. Furthermore, we observe statistically significant variability across cultural variants of Spanish and Portuguese, supporting the hypothesis that LLMs internalise language-specific sociolinguistic norms and that sycophancy is not a purely technical deficit but an emergent property of multilingual training and alignment. Finally, we identify stable sycophancy fingerprints–domain-specific vulnerability profiles that persist across languages–suggesting that alignment artefacts are intrinsic to model families rather than linguistic context.
Can Activation Steering Generalize Across Languages? A Study on Syllogistic Reasoning in Language Models
Gabriele Maraia | Leonardo Ranaldi | Marco Valentino | Fabio Massimo Zanzotto
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Gabriele Maraia | Leonardo Ranaldi | Marco Valentino | Fabio Massimo Zanzotto
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) often struggle with formal logical reasoning, frequently conflating content plausibility with logical validity. This well-known content effect undermines their capacity to act as reliable deductive reasoners, particularly in multilingual contexts where both linguistic variability and world knowledge may deepen biases. Prior work shows that prompting and tuning interventions can alleviate these issues only partially, leaving models vulnerable to semantic interference.While previous studies have explored activation steering and other test-time interventions, this work has focused predominantly on English.To make reasoning more consistent, robust, and transferable across languages, we investigate the use of activation steering—an inference-time intervention that modulates internal representations towards a cross-lingual reasoning space. Our experiments demonstrate that steering techniques constructed for English-based syllogisms generalise effectively to multilingual datasets, yielding higher formal reasoning accuracy (up to +36%) while minimally affecting language modelling performance. Moreover, steering supports partial transfer to out-of-distribution tasks, highlighting its potential as a scalable mechanism for cross-lingual transferable reasoning. These findings advance the prospect of developing LLMs that can serve as reliable soft reasoners across language landscapes.