Bowen Dong


2024

pdf
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
Lijun Li | Bowen Dong | Ruohui Wang | Xuhao Hu | Wangmeng Zuo | Dahua Lin | Yu Qiao | Jing Shao
Findings of the Association for Computational Linguistics ACL 2024

In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount. To meet this crucial need, we propose SALAD-Bench, a safety benchmark specifically designed for evaluating LLMs, attack, and defense methods. Distinguished by its breadth, SALAD-Bench transcends conventional benchmarks through its large scale, rich diversity, intricate taxonomy spanning three levels, and versatile functionalities.SALAD-Bench is crafted with a meticulous array of questions, from standard queries to complex ones enriched with attack, defense modifications and multiple-choice. To effectively manage the inherent complexity, we introduce an innovative evaluators: the LLM-based MD-Judge for QA pairs with a particular focus on attack-enhanced queries, ensuring a seamless, and reliable evaluation. Above components extend SALAD-Bench from standard LLM safety evaluation to both LLM attack and defense methods evaluation, ensuring the joint-purpose utility. Our extensive experiments shed light on the resilience of LLMs against emerging threats and the efficacy of contemporary defense tactics. Data and evaluator are released under https://github.com/OpenSafetyLab/SALAD-BENCH

2022

pdf
Prompt Tuning for Discriminative Pre-trained Language Models
Yuan Yao | Bowen Dong | Ao Zhang | Zhengyan Zhang | Ruobing Xie | Zhiyuan Liu | Leyu Lin | Maosong Sun | Jianyong Wang
Findings of the Association for Computational Linguistics: ACL 2022

Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. However, to the best of our knowledge, existing works focus on prompt-tuning generative PLMs that are pre-trained to generate target tokens, such as BERT. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. In this work, we present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem. Comprehensive experiments on text classification and question answering show that, compared with vanilla fine-tuning, DPT achieves significantly higher performance, and also prevents the unstable problem in tuning large PLMs in both full-set and low-resource settings.

2020

pdf
Meta-Information Guided Meta-Learning for Few-Shot Relation Classification
Bowen Dong | Yuan Yao | Ruobing Xie | Tianyu Gao | Xu Han | Zhiyuan Liu | Fen Lin | Leyu Lin | Maosong Sun
Proceedings of the 28th International Conference on Computational Linguistics

Few-shot classification requires classifiers to adapt to new classes with only a few training instances. State-of-the-art meta-learning approaches such as MAML learn how to initialize and fast adapt parameters from limited instances, which have shown promising results in few-shot classification. However, existing meta-learning models solely rely on implicit instance-based statistics, and thus suffer from instance unreliability and weak interpretability. To solve this problem, we propose a novel meta-information guided meta-learning (MIML) framework, where semantic concepts of classes provide strong guidance for meta-learning in both initialization and adaptation. In effect, our model can establish connections between instance-based information and semantic-based information, which enables more effective initialization and faster adaptation. Comprehensive experimental results on few-shot relation classification demonstrate the effectiveness of the proposed framework. Notably, MIML achieves comparable or superior performance to humans with only one shot on FewRel evaluation.