Liu Wei


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2023

pdf bib
Training NLI Models Through Universal Adversarial Attack
Lin Jieyu | Liu Wei | Zou Jiajie | Ding Nai
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“Pre-trained language models are sensitive to adversarial attacks, and recent works have demon-strated universal adversarial attacks that can apply input-agnostic perturbations to mislead mod-els. Here, we demonstrate that universal adversarial attacks can also be used to harden NLPmodels. Based on NLI task, we propose a simple universal adversarial attack that can misleadmodels to produce the same output for all premises by replacing the original hypothesis with anirrelevant string of words. To defend against this attack, we propose Training with UNiversalAdversarial Samples (TUNAS), which iteratively generates universal adversarial samples andutilizes them for fine-tuning. The method is tested on two datasets, i.e., MNLI and SNLI. It isdemonstrated that, TUNAS can reduce the mean success rate of the universal adversarial attackfrom above 79% to below 5%, while maintaining similar performance on the original datasets. Furthermore, TUNAS models are also more robust to the attack targeting at individual samples:When search for hypotheses that are best entailed by a premise, the hypotheses found by TUNASmodels are more compatible with the premise than those found by baseline models. In sum, weuse universal adversarial attack to yield more robust models. Introduction”