Zheng He (何铮) - ACL Anthology

Zheng He

Also published as: 铮何

2025

Generative large language models are crucial in natural language processing, but they are vulnerable to backdoor attacks, where subtle triggers compromise their behavior. Although backdoor attacks against LLMs are constantly emerging, existing benchmarks remain limited in terms of sufficient coverage of attack, metric system integrity, backdoor attack alignment. And existing pre-trained backdoor attacks are idealized in practice due to resource access constraints. Therefore we establish ELBA-Bench, a comprehensive and unified framework that allows attackers to inject backdoor through parameter efficient fine-tuning (e.g., LoRA) or without fine-tuning techniques (e.g., In-context-learning). ELBA-Bench provides over 1300 experiments encompassing the implementations of 12 attack methods, 18 datasets, and 12 LLMs. Extensive experiments provide new invaluable findings into the strengths and limitations of various attack strategies. For instance, PEFT attack consistently outperform without fine-tuning approaches in classification tasks while showing strong cross-dataset generalization with optimized triggers boosting robustness; Task-relevant backdoor optimization techniques or attack prompts along with clean and adversarial demonstrations can enhance backdoor attack success while preserving model performance on clean samples. Additionally, we introduce a universal toolbox designed for standardized backdoor attack research at https://github.com/NWPUliuxx/ELBA_Bench, with the goal of propelling further progress in this vital area.

2022

pdf bib abs
标签先验知识增强的方面类别情感分析方法研究(Aspect-Category based Sentiment Analysis Enhanced by Label Prior Knowledge)
Renwei Wu (吴任伟) | Lin Li (李琳) | Zheng He (何铮) | Jingling Yuan (袁景凌)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“当前,基于方面类别的情感分析研究旨在将方面类别检测和面向类别的情感分类两个任务协同进行。然而,现有研究未能有效关注情感数据集中存在的噪声标签,影响了情感分析的质量。基于此,本文提出一种标签先验知识增强的方面类别情感分析方法(AP-LPK)。首先本文为面向类别的情感分类构建了自回归提示训练方式,可以激发预训练语言模型的潜力。同时该方式通过自回归生成标签词,以期获得比非自回归更好的语义一致性。其次,每个类别的标签分布作为标签先验知识引入,并通过伯努利分布对其进行进一步精炼,以用于减轻噪声标签的干扰。然后,AP-LPK将上述两个步骤分别得到的情感类别分布进行融合,以获得最终的情感类别预测概率。最后,本文提出的AP-LPK方法在五个数据集上进行评估,包括SemEval 2015和2016的四个基准数据集和AI Challenger 2018的餐厅领域大规模数据集。实验结果表明,本文提出的方法在F1指标上优于现有方法。”

Co-authors

Jingling Yuan (袁景凌) 1

Venues

acl1
ccl1

Fix author