2024
pdf
abs
Detoxifying Large Language Models via Knowledge Editing
Mengru Wang
|
Ningyu Zhang
|
Ziwen Xu
|
Zekun Xi
|
Shumin Deng
|
Yunzhi Yao
|
Qishen Zhang
|
Linyi Yang
|
Jindong Wang
|
Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper investigates using knowledge editing techniques to detoxify Large Language Models (LLMs). We construct a benchmark, SafeEdit, which covers nine unsafe categories with various powerful attack prompts and equips comprehensive metrics for systematic evaluation. We conduct experiments with several knowledge editing approaches, indicating that knowledge editing has the potential to efficiently detoxify LLMs with limited impact on general performance. Then, we propose a simple yet effective baseline, dubbed Detoxifying with Intraoperative Neural Monitoring (DINM), to diminish the toxicity of LLMs within a few tuning steps via only one instance. We further provide an in-depth analysis of the internal mechanism for various detoxifying approaches, demonstrating that previous methods like SFT and DPO may merely suppress the activations of toxic parameters, while DINM mitigates the toxicity of the toxic parameters to a certain extent, making permanent adjustments. We hope that these insights could shed light on future work of developing detoxifying approaches and the underlying knowledge mechanisms of LLMs.
pdf
abs
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
Peng Wang
|
Ningyu Zhang
|
Bozhong Tian
|
Zekun Xi
|
Yunzhi Yao
|
Ziwen Xu
|
Mengru Wang
|
Shengyu Mao
|
Xiaohan Wang
|
Siyuan Cheng
|
Kangwei Liu
|
Yuansheng Ni
|
Guozhou Zheng
|
Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to outdated/noisy data. To this end, many knowledge editing approaches for LLMs have emerged – aiming to subtly inject/edit updated knowledge or adjust undesired behavior while minimizing the impact on unrelated inputs. Nevertheless, due to significant differences among various knowledge editing methods and the variations in task setups, there is no standard implementation framework available for the community, which hinders practitioners from applying knowledge editing to applications. To address these issues, we propose EasyEdit, an easy-to-use knowledge editing framework for LLMs. It supports various cutting-edge knowledge editing approaches and can be readily applied to many well-known LLMs such as T5, GPT-J, LlaMA, etc. Empirically, we report the knowledge editing results on LlaMA-2 with EasyEdit, demonstrating that knowledge editing surpasses traditional fine-tuning in terms of reliability and generalization. We have released the source code on GitHub, along with Google Colab tutorials and comprehensive documentation for beginners to get started. Besides, we present an online system for real-time knowledge editing, and a demo video.
2023
pdf
LambdaKG: A Library for Pre-trained Language Model-Based Knowledge Graph Embeddings
Xin Xie
|
Zhoubo Li
|
Xiaohan Wang
|
ZeKun Xi
|
Ningyu Zhang
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations