Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions
Haochen Shi, Shaobo Li, Guoqing Chao, Xiaoliang Shi, Wentao Chen, Zhenzhou Ji
Abstract
Large Language Models (LLMs) require robust evaluation. However, existing frameworks often rely on curated datasets that, once public, may be accessed by newer LLMs. This creates a risk of data leakage, where test sets inadvertently become part of training data, compromising evaluation fairness and integrity. To mitigate this issue, we propose Behave as Claimed (BaC), a novel evaluation framework inspired by counterfactual reasoning. BaC constructs a “what-if” scenario where LLMs respond to counterfactual questions about how they would behave if the input were manipulated. We refer to these responses as claims, which are verifiable by observing the LLMs’ actual behavior when given the manipulated input. BaC dynamically generates and verifies counterfactual questions using various few-shot in-context learning evaluation datasets, reducing their susceptibility to data leakage. Moreover, BaC provides a more challenging evaluation paradigm for LLMs. LLMs must thoroughly understand the prompt, the task, and the consequences of their responses to achieve better performance. We evaluate several state-of-the-art LLMs and find that, while most perform well on the original datasets, they struggle with BaC. This suggests that LLMs usually fail to align their claims with their actual behavior and that high performance on standard datasets may be less stable than previously assumed.- Anthology ID:
- 2025.emnlp-main.1479
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 29031–29044
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1479/
- DOI:
- Cite (ACL):
- Haochen Shi, Shaobo Li, Guoqing Chao, Xiaoliang Shi, Wentao Chen, and Zhenzhou Ji. 2025. Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 29031–29044, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions (Shi et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1479.pdf