Maximilian Mozes


Identifying Human Strategies for Generating Word-Level Adversarial Examples
Maximilian Mozes | Bennett Kleinberg | Lewis Griffin
Findings of the Association for Computational Linguistics: EMNLP 2022

Adversarial examples in NLP are receiving increasing research attention. One line of investigation is the generation of word-level adversarial examples against fine-tuned Transformer models that preserve naturalness and grammaticality. Previous work found that human- and machine-generated adversarial examples are comparable in their naturalness and grammatical correctness. Most notably, humans were able to generate adversarial examples much more effortlessly than automated attacks. In this paper, we provide a detailed analysis of exactly how humans create these adversarial examples. By exploring the behavioural patterns of human workers during the generation process, we identify statistically significant tendencies based on which words humans prefer to select for adversarial replacement (e.g., word frequencies, word saliencies, sentiment) as well as where and when words are replaced in an input sequence. With our findings, we seek to inspire efforts that harness human strategies for more robust NLP models.

pdf bib
Proceedings of the 7th Workshop on Representation Learning for NLP
Spandana Gella | He He | Bodhisattwa Prasad Majumder | Burcu Can | Eleonora Giunchiglia | Samuel Cahyawijaya | Sewon Min | Maximilian Mozes | Xiang Lorraine Li | Isabelle Augenstein | Anna Rogers | Kyunghyun Cho | Edward Grefenstette | Laura Rimell | Chris Dyer
Proceedings of the 7th Workshop on Representation Learning for NLP


Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples
Maximilian Mozes | Pontus Stenetorp | Bennett Kleinberg | Lewis Griffin
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Recent efforts have shown that neural text processing models are vulnerable to adversarial examples, but the nature of these examples is poorly understood. In this work, we show that adversarial attacks against CNN, LSTM and Transformer-based classification models perform word substitutions that are identifiable through frequency differences between replaced words and their corresponding substitutions. Based on these findings, we propose frequency-guided word substitutions (FGWS), a simple algorithm exploiting the frequency properties of adversarial word substitutions for the detection of adversarial examples. FGWS achieves strong performance by accurately detecting adversarial examples on the SST-2 and IMDb sentiment datasets, with F1 detection scores of up to 91.4% against RoBERTa-based classification models. We compare our approach against a recently proposed perturbation discrimination framework and show that we outperform it by up to 13.0% F1.

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification
Maximilian Mozes | Max Bartolo | Pontus Stenetorp | Bennett Kleinberg | Lewis Griffin
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Research shows that natural language processing models are generally considered to be vulnerable to adversarial attacks; but recent work has drawn attention to the issue of validating these adversarial inputs against certain criteria (e.g., the preservation of semantics and grammaticality). Enforcing constraints to uphold such criteria may render attacks unsuccessful, raising the question of whether valid attacks are actually feasible. In this work, we investigate this through the lens of human language ability. We report on crowdsourcing studies in which we task humans with iteratively modifying words in an input text, while receiving immediate model feedback, with the aim of causing a sentiment classification model to misclassify the example. Our findings suggest that humans are capable of generating a substantial amount of adversarial examples using semantics-preserving word substitutions. We analyze how human-generated adversarial examples compare to the recently proposed TextFooler, Genetic, BAE and SememePSO attack algorithms on the dimensions naturalness, preservation of sentiment, grammaticality and substitution rate. Our findings suggest that human-generated adversarial examples are not more able than the best algorithms to generate natural-reading, sentiment-preserving examples, though they do so by being much more computationally efficient.


Measuring Emotions in the COVID-19 Real World Worry Dataset
Bennett Kleinberg | Isabelle van der Vegt | Maximilian Mozes
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

The COVID-19 pandemic is having a dramatic impact on societies and economies around the world. With various measures of lockdowns and social distancing in place, it becomes important to understand emotional responses on a large scale. In this paper, we present the first ground truth dataset of emotional responses to COVID-19. We asked participants to indicate their emotions and express these in text. This resulted in the Real World Worry Dataset of 5,000 texts (2,500 short + 2,500 long texts). Our analyses suggest that emotional responses correlated with linguistic measures. Topic modeling further revealed that people in the UK worry about their family and the economic situation. Tweet-sized texts functioned as a call for solidarity, while longer texts shed light on worries and concerns. Using predictive modeling approaches, we were able to approximate the emotional responses of participants from text within 14% of their actual value. We encourage others to use the dataset and improve how we can use automated methods to learn about emotional responses and worries about an urgent problem.


Uphill from here: Sentiment patterns in videos from left- and right-wing YouTube news channels
Felix Soldner | Justin Chun-ting Ho | Mykola Makhortykh | Isabelle W.J. van der Vegt | Maximilian Mozes | Bennett Kleinberg
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science

News consumption exhibits an increasing shift towards online sources, which bring platforms such as YouTube more into focus. Thus, the distribution of politically loaded news is easier, receives more attention, but also raises the concern of forming isolated ideological communities. Understanding how such news is communicated and received is becoming increasingly important. To expand our understanding in this domain, we apply a linguistic temporal trajectory analysis to analyze sentiment patterns in English-language videos from news channels on YouTube. We examine transcripts from videos distributed through eight channels with pro-left and pro-right political leanings. Using unsupervised clustering, we identify seven different sentiment patterns in the transcripts. We found that the use of two sentiment patterns differed significantly depending on political leaning. Furthermore, we used predictive models to examine how different sentiment patterns relate to video popularity and if they differ depending on the channel’s political leaning. No clear relations between sentiment patterns and popularity were found. However, results indicate, that videos from pro-right news channels are more popular and that a negative sentiment further increases that popularity, when sentiments are averaged for each video.


Identifying the sentiment styles of YouTube’s vloggers
Bennett Kleinberg | Maximilian Mozes | Isabelle van der Vegt
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Vlogs provide a rich public source of data in a novel setting. This paper examined the continuous sentiment styles employed in 27,333 vlogs using a dynamic intra-textual approach to sentiment analysis. Using unsupervised clustering, we identified seven distinct continuous sentiment trajectories characterized by fluctuations of sentiment throughout a vlog’s narrative time. We provide a taxonomy of these seven continuous sentiment styles and found that vlogs whose sentiment builds up towards a positive ending are the most prevalent in our sample. Gender was associated with preferences for different continuous sentiment trajectories. This paper discusses the findings with respect to previous work and concludes with an outlook towards possible uses of the corpus, method and findings of this paper for related areas of research.