Interpretability-based Tailored Knowledge Editing in Transformers

Yihuai Hong, Aldo Lipani


Abstract
Language models recognized as a new form of knowledge bases, face challenges of outdated, erroneous, and privacy-sensitive information, necessitating knowledge editing to rectify errors without costly retraining. Existing methods, spanning model’s parameters modification, external knowledge integration, and in-context learning, lack in-depth analysis from a model interpretability perspective. Our work explores the instability in in-context learning outcomes, providing insights into its reasons and distinctions from other methods. Leveraging findings on the critical role of feed-forward MLPs in decoder-only models, we propose a tailored knowledge editing method, TailoredKE, that considers the unique information flow of each sample. Model interpretability reveals diverse attribute recall across transformer layers, guiding edits to specific features at different depths and mitigating over-editing issues.
Anthology ID:
2024.emnlp-main.225
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3847–3858
Language:
URL:
https://preview.aclanthology.org/add-emnlp-2024-awards/2024.emnlp-main.225/
DOI:
10.18653/v1/2024.emnlp-main.225
Bibkey:
Cite (ACL):
Yihuai Hong and Aldo Lipani. 2024. Interpretability-based Tailored Knowledge Editing in Transformers. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 3847–3858, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Interpretability-based Tailored Knowledge Editing in Transformers (Hong & Lipani, EMNLP 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add-emnlp-2024-awards/2024.emnlp-main.225.pdf