Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings

Wei Zhou, Heike Adel, Hendrik Schuff, Ngoc Thang Vu


Abstract
Attribution scores indicate the importance of different input parts and can, thus, explain model behaviour. Currently, prompt-based models are gaining popularity, i.a., due to their easier adaptability in low-resource settings. However, the quality of attribution scores extracted from prompt-based models has not been investigated yet. In this work, we address this topic by analyzing attribution scores extracted from prompt-based models w.r.t. plausibility and faithfulness and comparing them with attribution scores extracted from fine-tuned models and large language models. In contrast to previous work, we introduce training size as another dimension into the analysis. We find that using the prompting paradigm (with either encoder-based or decoder-based models) yields more plausible explanations than fine-tuning the models in low-resource settings and Shapley Value Sampling consistently outperforms attention and Integrated Gradients in terms of leading to more plausible and faithful explanations.
Anthology ID:
2024.lrec-main.600
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
6867–6875
Language:
URL:
https://aclanthology.org/2024.lrec-main.600
DOI:
Bibkey:
Cite (ACL):
Wei Zhou, Heike Adel, Hendrik Schuff, and Ngoc Thang Vu. 2024. Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6867–6875, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings (Zhou et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2024.lrec-main.600.pdf