MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

Zexuan Zhong, Zhengxuan Wu, Christopher Manning, Christopher Potts, Danqi Chen


Abstract
The information stored in large language models (LLMs) falls out of date quickly, and retraining from scratch is often not an option. This has recently given rise to a range of techniques for injecting new facts through updating model weights. Current evaluation paradigms are extremely limited, mainly validating the recall of edited facts, but changing one fact should cause rippling changes to the model’s related beliefs. If we edit the UK Prime Minister to now be Rishi Sunak, then we should get a different answer to Who is married to the British Prime Minister? In this work, we present a benchmark MQuAKE (Multi-hop Question Answering for Knowledge Editing) comprising multi-hop questions that assess whether edited models correctly answer questions where the answer should change as an entailed consequence of edited facts. While we find that current knowledge-editing approaches can recall edited facts accurately, they fail catastrophically on the constructed multi-hop questions. We thus propose a simple memory-based approach, MeLLo, which stores all edited facts externally while prompting the language model iteratively to generate answers that are consistent with the edited facts. While MQuAKE remains challenging, we show that MeLLo scales well with LLMs (up to 175B) and outperforms previous model editors by a large margin.
Anthology ID:
2023.emnlp-main.971
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15686–15702
Language:
URL:
https://aclanthology.org/2023.emnlp-main.971
DOI:
10.18653/v1/2023.emnlp-main.971
Bibkey:
Cite (ACL):
Zexuan Zhong, Zhengxuan Wu, Christopher Manning, Christopher Potts, and Danqi Chen. 2023. MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15686–15702, Singapore. Association for Computational Linguistics.
Cite (Informal):
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions (Zhong et al., EMNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.emnlp-main.971.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-5/2023.emnlp-main.971.mp4