2024
pdf
abs
Detoxifying Large Language Models via Knowledge Editing
Mengru Wang
|
Ningyu Zhang
|
Ziwen Xu
|
Zekun Xi
|
Shumin Deng
|
Yunzhi Yao
|
Qishen Zhang
|
Linyi Yang
|
Jindong Wang
|
Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper investigates using knowledge editing techniques to detoxify Large Language Models (LLMs). We construct a benchmark, SafeEdit, which covers nine unsafe categories with various powerful attack prompts and equips comprehensive metrics for systematic evaluation. We conduct experiments with several knowledge editing approaches, indicating that knowledge editing has the potential to efficiently detoxify LLMs with limited impact on general performance. Then, we propose a simple yet effective baseline, dubbed Detoxifying with Intraoperative Neural Monitoring (DINM), to diminish the toxicity of LLMs within a few tuning steps via only one instance. We further provide an in-depth analysis of the internal mechanism for various detoxifying approaches, demonstrating that previous methods like SFT and DPO may merely suppress the activations of toxic parameters, while DINM mitigates the toxicity of the toxic parameters to a certain extent, making permanent adjustments. We hope that these insights could shed light on future work of developing detoxifying approaches and the underlying knowledge mechanisms of LLMs.
pdf
abs
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
Peng Wang
|
Ningyu Zhang
|
Bozhong Tian
|
Zekun Xi
|
Yunzhi Yao
|
Ziwen Xu
|
Mengru Wang
|
Shengyu Mao
|
Xiaohan Wang
|
Siyuan Cheng
|
Kangwei Liu
|
Yuansheng Ni
|
Guozhou Zheng
|
Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to outdated/noisy data. To this end, many knowledge editing approaches for LLMs have emerged – aiming to subtly inject/edit updated knowledge or adjust undesired behavior while minimizing the impact on unrelated inputs. Nevertheless, due to significant differences among various knowledge editing methods and the variations in task setups, there is no standard implementation framework available for the community, which hinders practitioners from applying knowledge editing to applications. To address these issues, we propose EasyEdit, an easy-to-use knowledge editing framework for LLMs. It supports various cutting-edge knowledge editing approaches and can be readily applied to many well-known LLMs such as T5, GPT-J, LlaMA, etc. Empirically, we report the knowledge editing results on LlaMA-2 with EasyEdit, demonstrating that knowledge editing surpasses traditional fine-tuning in terms of reliability and generalization. We have released the source code on GitHub, along with Google Colab tutorials and comprehensive documentation for beginners to get started. Besides, we present an online system for real-time knowledge editing, and a demo video.
2022
pdf
abs
DRK: Discriminative Rule-based Knowledge for Relieving Prediction Confusions in Few-shot Relation Extraction
Mengru Wang
|
Jianming Zheng
|
Fei Cai
|
Taihua Shao
|
Honghui Chen
Proceedings of the 29th International Conference on Computational Linguistics
Few-shot relation extraction aims to identify the relation type between entities in a given text in the low-resource scenario. Albeit much progress, existing meta-learning methods still fall into prediction confusions owing to the limited inference ability over shallow text features. To relieve these confusions, this paper proposes a discriminative rule-based knowledge (DRK) method. Specifically, DRK adopts a logic-aware inference module to ease the word-overlap confusion, which introduces a logic rule to constrain the inference process, thereby avoiding the adverse effect of shallow text features. Also, DRK employs a discrimination finding module to alleviate the entity-type confusion, which explores distinguishable text features via a hierarchical contrastive learning. We conduct extensive experiments on four types of meta tasks and the results show promising improvements from DRK (6.0% accuracy gains on average). Besides, error analyses reveal the word-overlap and entity-type errors are the main courses of mispredictions in few-shot relation extraction.