Zheyuan Liu
2024
Towards Safer Large Language Models through Machine Unlearning
Zheyuan Liu
|
Guangyao Dou
|
Zhaoxuan Tan
|
Yijun Tian
|
Meng Jiang
Findings of the Association for Computational Linguistics ACL 2024
The rapid advancement of Large Language Models (LLMs) has demonstrated their vast potential across various domains, attributed to their extensive pretraining knowledge and exceptional generalizability. However, LLMs often encounter challenges in generating harmful content when faced with problematic prompts. To address this problem, existing work attempted to implement a gradient ascent based approach to prevent LLMs from producing harmful output. While these methods can be effective, they frequently impact the model utility in responding to normal prompts. To address this gap, we introduce Selective Knowledge negation Unlearning (SKU), a novel unlearning framework for LLMs, designed to eliminate harmful knowledge while preserving utility on normal prompts. Specifically, SKU is consisted of two stages: harmful knowledge acquisition stage and knowledge negation stage. The first stage aims to identify and acquire harmful knowledge within the model, whereas the second is dedicated to remove this knowledge. SKU selectively isolates and removes harmful knowledge in model parameters, ensuring the model’s performance remains robust on normal prompts. Our experiments conducted across various LLM architectures demonstrate that SKU identifies a good balance point between removing harmful information and preserving utility.
2017
KnowYourNyms? A Game of Semantic Relationships
Ross Mechanic
|
Dean Fulgoni
|
Hannah Cutler
|
Sneha Rajana
|
Zheyuan Liu
|
Bradley Jackson
|
Anne Cocos
|
Chris Callison-Burch
|
Marianna Apidianaki
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Semantic relation knowledge is crucial for natural language understanding. We introduce “KnowYourNyms?”, a web-based game for learning semantic relations. While providing users with an engaging experience, the application collects large amounts of data that can be used to improve semantic relation classifiers. The data also broadly informs us of how people perceive the relationships between words, providing useful insights for research in psychology and linguistics.
Search
Co-authors
- Guangyao Dou 1
- Zhaoxuan Tan 1
- Yijun Tian 1
- Meng Jiang 1
- Ross Mechanic 1
- show all...