Exploring Coherence of LLMs in Multilingual Question Answering

Stefano Campese, Ivano Lauriola


Abstract
Recent studies have highlighted that Large Language Models (LLMs) often exhibit limited coherence, that is the ability to produce consistent responses to semantically equivalent questions. While most prior research has focused exclusively on English, limited investigation has been conducted on other languages. In this work, we study the coherence of LLMs on Question Answering tasks across six languages: English, Italian, German, Chinese, Japanese, and Vietnamese. We evaluate models of varying sizes, ranging from 3.8B to 235B parameters, to examine how coherence scales with model capacity and how it relates to languages. Our findings reveal that (i) coherence is not uniquely related to model size and accuracy and (ii) for some models, coherence varies significantly between languages.
Anthology ID:
2026.gem-main.52
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
554–562
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.52/
DOI:
Bibkey:
Cite (ACL):
Stefano Campese and Ivano Lauriola. 2026. Exploring Coherence of LLMs in Multilingual Question Answering. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 554–562, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Exploring Coherence of LLMs in Multilingual Question Answering (Campese & Lauriola, GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.52.pdf