Hongwei Li

2026

The deployment of large language models (LLMs) raises significant ethical and safety concerns. While LLM alignment techniques are adopted to improve model safety and trustworthiness, adversaries can exploit these techniques to undermine safety for malicious purposes, resulting in misalignment. Misaligned LLMs may be published on open platforms to magnify harm. To address this, additional safety alignment, referred to as realignment, is necessary before deploying untrusted third-party LLMs. This study explores the efficacy of fine-tuning methods in terms of misalignment, realignment, and the effects of their interplay. By evaluating four Supervised Fine-Tuning (SFT) and two Preference Fine-Tuning (PFT) methods across four popular safety-aligned LLMs, we reveal a mechanism asymmetry between attack and defense. While Odds Ratio Preference Optimization (ORPO) is most effective for misalignment, Direct Preference Optimization (DPO) excels in realignment, albeit at the expense of model utility. Additionally, we identify model-specific resistance, residual effects of multi-round adversarial dynamics, and other noteworthy findings. These findings highlight the need for robust safeguards and customized safety alignment strategies to mitigate potential risks in the deployment of LLMs.

2025

pdf bib abs

K-CoT:基于关键词思维链提示的中文排比句生成研究
Maosheng Zhong | Jiaqi Ganjiaqi | Hejun Zhang | Linkang Xie | Hongwei Li
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"本文针对中文排比句研究面临的高质量语料匮乏和细粒度标注缺失两大挑战,构建了一个包含主题、情感基调、排比标志词和关键词多维标注的中文排比句语料库。基于此,本文提出了一种基于关键词引导的思维链排比句生成框架K-CoT,通过模拟人类修辞创作的认知过程,将排比句生成分解为“主题解构-特征映射-关键词生成-句式合成”的渐进式推理流程。在ChatGLM和Llama等主流模型上的实验表明,本文提出的K-CoT在排比句生成任务上取得了显著的性能提升。本文为排比句研究提供了一个新颖的数据集,也为生成模型的修辞能力优化提供了可解释的技术路径,其分阶段推理机制对提升语言模型的语义可控性具有普适意义。"

pdf bib abs

A core barrier preventing recommender systems from reaching their full potential lies in the inherent limitations of user-item interaction data: (1) Sparse user-item interactions, making it difficult to learn reliable user preferences; (2) Traditional contrastive learning methods often treat negative samples as equally hard or easy, ignoring the informative semantic difficulty during training. (3) Modern LLM-based recommender systems, on the other hand, discard all negative feedback, leading to unbalanced preference modeling. To address these issues, we propose LAGCL4Rec, a framework leveraging Large Language Models to Activate interactions in Graph Contrastive Learning for Recommendation. Our approach operates through three stages: (i) Data-Level: augmenting sparse interactions with balanced positive and negative samples using LLM-enriched profiles; (ii) Rank-Level: assessing semantic difficulty of negative samples through LLM-based grouping for fine-grained contrastive learning; and (iii) Rerank-Level: reasoning over augmented historical interactions for personalized recommendations. Theoretical analysis proves that LAGCL4Rec achieves effective information utilization with minimal computational overhead. Experiments across multiple benchmarks confirm our method consistently outperforms state-of-the-art baselines.