Carla Pérez-Almendros

Also published as: Carla Perez Almendros, Carla Perez-Almendros

2022

pdf abs
Pre-Training Language Models for Identifying Patronizing and Condescending Language: An Analysis
Carla Perez Almendros | Luis Espinosa Anke | Steven Schockaert
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Patronizing and Condescending Language (PCL) is a subtle but harmful type of discourse, yet the task of recognizing PCL remains under-studied by the NLP community. Recognizing PCL is challenging because of its subtle nature, because available datasets are limited in size, and because this task often relies on some form of commonsense knowledge. In this paper, we study to what extent PCL detection models can be improved by pre-training them on other, more established NLP tasks. We find that performance gains are indeed possible in this way, in particular when pre-training on tasks focusing on sentiment, harmful language and commonsense morality. In contrast, for tasks focusing on political speech and social justice, no or only very small improvements were witnessed. These findings improve our understanding of the nature of PCL.

pdf abs
Identifying Condescending Language: A Tale of Two Distinct Phenomena?
Carla Perez Almendros | Steven Schockaert
Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI)

Patronizing and condescending language is characterized, among others, by its subtle nature. It thus seems reasonable to assume that detecting condescending language in text would be harder than detecting more explicitly harmful language, such as hate speech. However, the results of a SemEval-2022 Task devoted to this topic paint a different picture, with the top-performing systems achieving remarkably strong results. In this paper, we analyse the surprising effectiveness of standard text classification methods in more detail. In particular, we highlight the presence of two rather different types of condescending language in the dataset from the SemEval task. Some inputs are condescending because of the way they talk about a particular subject, i.e. condescending language in this case is a linguistic phenomenon, which can, in principle, be learned from training examples. However, other inputs are condescending because of the nature of what is said, rather than the way in which it is expressed, e.g. by emphasizing stereotypes about a given community. In such cases, our ability to detect condescending language, with current methods, largely depends on the presence of similar examples in the training data.

pdf abs
SemEval-2022 Task 4: Patronizing and Condescending Language Detection
Carla Perez-Almendros | Luis Espinosa-Anke | Steven Schockaert
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents an overview of Task 4 at SemEval-2022, which was focused on detecting Patronizing and Condescending Language (PCL) towards vulnerable communities. Two sub-tasks were considered: a binary classification task, where participants needed to classify a given paragraph as containing PCL or not, and a multi-label classification task, where participants needed to identify which types of PCL are present (if any). The task attracted more than 300 participants, 77 teams and 229 valid submissions. We provide an overview of how the task was organized, discuss the techniques that were employed by the different participants, and summarize the main resulting insights about PCL detection and categorization.

2020

pdf abs
Don’t Patronize Me! An Annotated Dataset with Patronizing and Condescending Language towards Vulnerable Communities
Carla Perez Almendros | Luis Espinosa Anke | Steven Schockaert
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we introduce a new annotated dataset which is aimed at supporting the development of NLP models to identify and categorize language that is patronizing or condescending towards vulnerable communities (e.g. refugees, homeless people, poor families). While the prevalence of such language in the general media has long been shown to have harmful effects, it differs from other types of harmful language, in that it is generally used unconsciously and with good intentions. We furthermore believe that the often subtle nature of patronizing and condescending language (PCL) presents an interesting technical challenge for the NLP community. Our analysis of the proposed dataset shows that identifying PCL is hard for standard NLP models, with language models such as BERT achieving the best results.

2019

pdf abs
Cardiff University at SemEval-2019 Task 4: Linguistic Features for Hyperpartisan News Detection
Carla Pérez-Almendros | Luis Espinosa-Anke | Steven Schockaert
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper summarizes our contribution to the Hyperpartisan News Detection task in SemEval 2019. We experiment with two different approaches: 1) an SVM classifier based on word vector averages and hand-crafted linguistic features, and 2) a BiLSTM-based neural text classifier trained on a filtered training set. Surprisingly, despite their different nature, both approaches achieve an accuracy of 0.74. The main focus of this paper is to further analyze the remarkable fact that a simple feature-based approach can perform on par with modern neural classifiers. We also highlight the effectiveness of our filtering strategy for training the neural network on a large but noisy training set.