Evaluating Readability and Faithfulness of Concept-based Explanations
Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang
Abstract
With the growing popularity of general-purpose Large Language Models (LLMs), comes a need for more global explanations of model behaviors. Concept-based explanations arise as a promising avenue for explaining high-level patterns learned by LLMs. Yet their evaluation poses unique challenges, especially due to their non-local nature and high dimensional representation in a model’s hidden space. Current methods approach concepts from different perspectives, lacking a unified formalization. This makes evaluating the core measures of concepts, namely faithfulness or readability, challenging. To bridge the gap, we introduce a formal definition of concepts generalizing to diverse concept-based explanations’ settings. Based on this, we quantify the faithfulness of a concept explanation via perturbation. We ensure adequate perturbation in the high-dimensional space for different concepts via an optimization problem. Readability is approximated via an automatic and deterministic measure, quantifying the coherence of patterns that maximally activate a concept while aligning with human understanding. Finally, based on measurement theory, we apply a meta-evaluation method for evaluating these measures, generalizable to other types of explanations or tasks as well. Extensive experimental analysis has been conducted to inform the selection of explanation evaluation measures.- Anthology ID:
- 2024.emnlp-main.36
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 607–625
- Language:
- URL:
- https://preview.aclanthology.org/add-emnlp-2024-awards/2024.emnlp-main.36/
- DOI:
- 10.18653/v1/2024.emnlp-main.36
- Cite (ACL):
- Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, and Xiting Wang. 2024. Evaluating Readability and Faithfulness of Concept-based Explanations. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 607–625, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating Readability and Faithfulness of Concept-based Explanations (Li et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/add-emnlp-2024-awards/2024.emnlp-main.36.pdf