Chuanyuan Tan
2025
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
Zhenhua Liu
|
Tong Zhu
|
Chuanyuan Tan
|
Wenliang Chen
Proceedings of the 31st International Conference on Computational Linguistics
Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. However, these models can inadvertently memorize private information, posing significant privacy risks. This study addresses the challenge of enabling LLMs to protect specific individuals’ private data without the need for complete retraining. We propose RETURN, a Real-world pErsonal daTa UnleaRNing dataset, comprising 2,492 individuals from Wikipedia with associated QA pairs, to evaluate machine unlearning (MU) methods for protecting personal data in a realistic scenario. Additionally, we introduce the Name-Aware Unlearning Framework (NAUF) for Privacy Protection, which enables the model to learn which individuals’ information should be protected without affecting its ability to answer questions related to other unrelated individuals. Our extensive experiments demonstrate that NAUF achieves a state-of-the-art average unlearning score, surpassing the best baseline method by 5.65 points, effectively protecting target individuals’ personal data while maintaining the model’s general capabilities.
UAQFact: Evaluating Factual Knowledge Utilization of LLMs on Unanswerable Questions
Chuanyuan Tan
|
Wenbiao Shao
|
Hao Xiong
|
Tong Zhu
|
Zhenhua Liu
|
Kai Shi
|
Wenliang Chen
Findings of the Association for Computational Linguistics: ACL 2025
Handling unanswerable questions (UAQ) is crucial for LLMs, as it helps prevent misleading responses in complex situations. While previous studies have built several datasets to assess LLMs’ performance on UAQ, these datasets lack factual knowledge support, which limits the evaluation of LLMs’ ability to utilize their factual knowledge when handling UAQ. To address the limitation, we introduce a new unanswerable question dataset UAQFact, a bilingual dataset with auxiliary factual knowledge created from a Knowledge Graph. Based on UAQFact, we further define two new tasks to measure LLMs’ ability to utilize internal and external factual knowledge, respectively. Our experimental results across multiple LLM series show that UAQFact presents significant challenges, as LLMs do not consistently perform well even when they have factual knowledge stored. Additionally, we find that incorporating external knowledge may enhance performance, but LLMs still cannot make full use of the knowledge which may result in incorrect responses. Our code and dataset are available at https://github.com/cytan17726/UAQ_Fact.
2024
Probing Language Models for Pre-training Data Detection
Zhenhua Liu
|
Tong Zhu
|
Chuanyuan Tan
|
Bing Liu
|
Haonan Lu
|
Wenliang Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase. Therefore, it is vital to detect the contamination by checking whether an LLM has been pre-trained on the target texts. Recent studies focus on the generated texts and compute perplexities, which are superficial features and not reliable. In this study, we propose to utilize the probing technique for pre-training data detection by examining the model’s internal activations. Our method is simple and effective and leads to more trustworthy pre-training data detection. Additionally, we propose ArxivMIA, a new challenging benchmark comprising arxiv abstracts from Computer Science and Mathematics categories. Our experiments demonstrate that our method outperforms all baselines, and achieves state-of-the-art performance on both WikiMIA and ArxivMIA, with additional experiments confirming its efficacy.
Search
Fix author
Co-authors
- Wenliang Chen (陈文亮) 3
- Zhenhua Liu 3
- Tong Zhu (朱桐) 3
- Bing Liu 1
- Haonan Lu 1
- show all...