Analyzing and Improving Coherence of Large Language Models in Question Answering

Ivano Lauriola, Stefano Campese, Alessandro Moschitti


Abstract
Large language models (LLMs) have recently revolutionized natural language processing. These models, however, often suffer from instability or lack of coherence, that is the ability of the models to generate semantically equivalent outputs when receiving diverse yet semantically equivalent input variations. In this work, we analyze the behavior of multiple LLMs, including Mixtral-8x7B, Llama2-70b, Smaug-72b, and Phi-3, when dealing with multiple lexical variations of the same info-seeking questions. Our results suggest that various LLMs struggle to consistently answer diverse equivalent queries. To address this issue, we show how redundant information encoded as a prompt can increase the coherence of these models. In addition, we introduce a Retrieval-Augmented Generation (RAG) technique that supplements LLMs with the top-k most similar questions from a question retrieval engine. This knowledge-augmentation leads to 4-8 percentage point improvement in end-to-end performance in factual question answering tasks. These findings underscore the need to enhance LLM stability and coherence through semantic awareness.
Anthology ID:
2025.naacl-long.588
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11740–11755
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.588/
DOI:
Bibkey:
Cite (ACL):
Ivano Lauriola, Stefano Campese, and Alessandro Moschitti. 2025. Analyzing and Improving Coherence of Large Language Models in Question Answering. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 11740–11755, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Analyzing and Improving Coherence of Large Language Models in Question Answering (Lauriola et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.588.pdf