Xin Lu
Other people with similar names: Xin Lu
2025
Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models
Di Wu
|
Xin Lu
|
Yanyan Zhao
|
Bing Qin
Findings of the Association for Computational Linguistics: ACL 2025
Although large language models (LLMs) achieve effective safety alignment at the time of release, they still face various safety challenges. A key issue is that fine-tuning often compromises the safety alignment of LLMs. To address this issue, we propose a method named IRR (Identify, Remove, and Recalibrate for Safety Realignment) that performs safety realignment for LLMs. The core of IRR is to identify and remove unsafe delta parameters from the fine-tuned models, while recalibrating the retained parameters. We evaluate the effectiveness of IRR across various datasets, including both full fine-tuning and LoRA methods. Our results demonstrate that IRR significantly enhances the safety performance of fine-tuned models on safety benchmarks, such as harmful queries and jailbreak attacks, while maintaining their performance on downstream tasks. The source code is available at: https://github.com/pikepokenew/IRR.
Self-Foveate: Enhancing Diversity and Difficulty of Synthesized Instructions from Unsupervised Text via Multi-Level Foveation
Mingzhe Li
|
Xin Lu
|
Yanyan Zhao
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) with instruction following capabilities have demonstrated impressive problem-solving abilities. While synthesizing instructional data from unsupervised text has become a common approach for training such models, conventional methods rely heavily on human effort for data annotation. Although existing automated synthesis paradigms have alleviated this constraint, they still exhibit significant limitations in ensuring adequate diversity and difficulty of synthesized instructions. To address these challenges, we propose Self-Foveate, an innovative LLM-driven method for instruction synthesis. This approach introduces a “Micro-Scatter-Macro” multi-level foveation methodology that effectively guides the LLM to deeply excavate fine-grained information embedded in unsupervised text, thereby enhancing both the diversity and difficulty of synthesized instructions. Comprehensive experiments across multiple unsupervised corpora and diverse model architectures validate the effectiveness and superiority of our proposed method. We publicly release our data and codes: https://github.com/Mubuky/Self-Foveate
2023
Don’t Lose Yourself! Empathetic Response Generation via Explicit Self-Other Awareness
Weixiang Zhao
|
Yanyan Zhao
|
Xin Lu
|
Bing Qin
Findings of the Association for Computational Linguistics: ACL 2023
As a critical step to achieve human-like chatbots, empathetic response generation has attained increasing interests. Previous attempts are incomplete and not sufficient enough to elicit empathy because they only stay on the initial stage of empathy to automatically sense and simulate the feelings and thoughts of others via other-awareness. However, they ignore to include self-awareness to consider the own views of the self in their responses, which is a crucial process to achieve the empathy. To this end, we propose to generate Empathetic response with explicit Self-Other Awareness (EmpSOA). Specifically, three stages, self-other differentiation, self-other modulation and self-other generation, are devised to clearly maintain, regulate and inject the self-other aware information into the process of empathetic response generation. Both automatic and human evaluations on the benchmark dataset demonstrate the superiority of EmpSOA to generate more empathetic responses. Our source code will be publicly available.