Assessing the Belief Consistency of Large Language Models on the Logical Conversation Process

Tomoki Tsujimura, Mat\={i}ss Rikters, Masaki Asada, Shusaku Egami, Tatsuya Ishigaki, Ken Yano, Hiroya Takamura


Abstract
To reliably interpret the evolving context of an LLM as a reasoning trace, the underlying belief of the LLM needs to transition consistently with the progression of the context.We focus on evaluating whether the beliefs held by a model remain consistent before and after the extension of the context.Previous research on consistency evaluation typically uses datasets with ground-truth answers, which is problematic because task-solving ability acts as a confounding factor, obscuring the direct evaluation of consistency.Furthermore, evaluating cases where inconsistency stems from multiple errors poses difficulties.We propose a new evaluation method to assess the consistency of LLMs in a multiple-choice question answering format, designed so that any option chosen is correct, allowing for the evaluation of the proposed belief consistency.It also supports isolation of errors such as reasoning failures and biases.We reveal that the belief consistency does not improve solely with model size scaling,whereas continual pre-training on code and mathematics text improves it.Furthermore, models trained on code and mathematics text show a seemingly contradictory result of increased logical failures, indicating that belief consistency and superficial consistency are not necessarily directly linked.
Anthology ID:
2026.acl-long.1860
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40032–40055
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1860/
DOI:
Bibkey:
Cite (ACL):
Tomoki Tsujimura, Mat\={i}ss Rikters, Masaki Asada, Shusaku Egami, Tatsuya Ishigaki, Ken Yano, and Hiroya Takamura. 2026. Assessing the Belief Consistency of Large Language Models on the Logical Conversation Process. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 40032–40055, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Assessing the Belief Consistency of Large Language Models on the Logical Conversation Process (Tsujimura et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1860.pdf
Checklist:
 2026.acl-long.1860.checklist.pdf