A Causal Lens for Evaluating Faithfulness Metrics

Kerem Zaman; Shashank Srivastava

doi:10.18653/v1/2025.emnlp-main.1496

A Causal Lens for Evaluating Faithfulness Metrics

Abstract

Large Language Models (LLMs) offer natural language explanations as an alternative to feature attribution methods for model interpretability. However, despite their plausibility, they may not reflect the model’s truereasoning faithfully. While several faithfulness metrics have been proposed, they are often evaluated in isolation, making principled comparisons between them difficult. We present Causal Diagnosticity, a testbed framework for evaluating faithfulness metrics for natural language explanations. We use the concept of diagnosticity, and employ model-editing methods to generate faithful-unfaithful explanation pairs. Our benchmark includes four tasks: fact-checking, analogy, object counting, and multi-hop reasoning. We evaluate prominent faithfulness metrics, including post-hoc explanation and chain-of-thought methods. Diagnostic performance varies across tasks and models, with Filler Tokens performing best overall. Additionally, continuous metrics are generally more diagnostic than binary ones but can be sensitive to noise and model choice. Our results highlight the need for more robust faithfulness metrics.

Anthology ID:: 2025.emnlp-main.1496
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29413–29437
Language:
URL:: https://preview.aclanthology.org/ingest-luhme/2025.emnlp-main.1496/
DOI:: 10.18653/v1/2025.emnlp-main.1496
Bibkey:
Cite (ACL):: Kerem Zaman and Shashank Srivastava. 2025. A Causal Lens for Evaluating Faithfulness Metrics. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 29413–29437, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: A Causal Lens for Evaluating Faithfulness Metrics (Zaman & Srivastava, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-luhme/2025.emnlp-main.1496.pdf
Checklist:: 2025.emnlp-main.1496.checklist.pdf

PDF Cite Search Checklist Fix data