CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure

Nuo Chen, Qiushi Sun, Renyu Zhu, Xiang Li, Xuesong Lu, Ming Gao


Abstract
Code pre-trained models (CodePTMs) have recently demonstrated significant success in code intelligence. To interpret these models, some probing methods have been applied. However, these methods fail to consider the inherent characteristics of codes. In this paper, to address the problem, we propose a novel probing method CAT-probing to quantitatively interpret how CodePTMs attend code structure. We first denoise the input code sequences based on the token types pre-defined by the compilers to filter those tokens whose attention scores are too small. After that, we define a new metric CAT-score to measure the commonality between the token-level attention scores generated in CodePTMs and the pair-wise distances between corresponding AST nodes. The higher the CAT-score, the stronger the ability of CodePTMs to capture code structure. We conduct extensive experiments to integrate CAT-probing with representative CodePTMs for different programming languages. Experimental results show the effectiveness of CAT-probing in CodePTM interpretation. Our codes and data are publicly available at https://github.com/nchen909/CodeAttention.
Anthology ID:
2022.findings-emnlp.295
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4000–4008
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.295
DOI:
10.18653/v1/2022.findings-emnlp.295
Bibkey:
Cite (ACL):
Nuo Chen, Qiushi Sun, Renyu Zhu, Xiang Li, Xuesong Lu, and Ming Gao. 2022. CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4000–4008, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure (Chen et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2022.findings-emnlp.295.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-3/2022.findings-emnlp.295.mp4