Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability

Xin Lv, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Yichi Zhang, Zelin Dai


Abstract
Multi-hop reasoning has been widely studied in recent years to obtain more interpretable link prediction. However, we find in experiments that many paths given by these models are actually unreasonable, while little work has been done on interpretability evaluation for them. In this paper, we propose a unified framework to quantitatively evaluate the interpretability of multi-hop reasoning models so as to advance their development. In specific, we define three metrics, including path recall, local interpretability, and global interpretability for evaluation, and design an approximate strategy to calculate these metrics using the interpretability scores of rules. We manually annotate all possible rules and establish a benchmark. In experiments, we verify the effectiveness of our benchmark. Besides, we run nine representative baselines on our benchmark, and the experimental results show that the interpretability of current multi-hop reasoning models is less satisfactory and is 51.7% lower than the upper bound given by our benchmark. Moreover, the rule-based models outperform the multi-hop reasoning models in terms of performance and interpretability, which points to a direction for future research, i.e., how to better incorporate rule information into the multi-hop reasoning model. We will publish our codes and datasets upon acceptance.
Anthology ID:
2021.emnlp-main.700
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8899–8911
Language:
URL:
https://aclanthology.org/2021.emnlp-main.700
DOI:
10.18653/v1/2021.emnlp-main.700
Bibkey:
Cite (ACL):
Xin Lv, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Yichi Zhang, and Zelin Dai. 2021. Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8899–8911, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability (Lv et al., EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2021.emnlp-main.700.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/2021.emnlp-main.700.mp4
Code
 THU-KEG/BIMR