Flor Miriam Plaza-del-Arco

Also published as: Flor Miriam Plaza del Arco


2022

pdf
SHARE: A Lexicon of Harmful Expressions by Spanish Speakers
Flor Miriam Plaza-del-Arco | Ana Belén Parras Portillo | Pilar López Úbeda | Beatriz Gil | María-Teresa Martín-Valdivia
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper we present SHARE, a new lexical resource with 10,125 offensive terms and expressions collected from Spanish speakers. We retrieve this vocabulary using an existing chatbot developed to engage a conversation with users and collect insults via Telegram, named Fiero. This vocabulary has been manually labeled by five annotators obtaining a kappa coefficient agreement of 78.8%. In addition, we leverage the lexicon to release the first corpus in Spanish for offensive span identification research named OffendES_spans. Finally, we show the utility of our resource as an interpretability tool to explain why a comment may be considered offensive.

pdf
Natural Language Inference Prompts for Zero-shot Emotion Classification in Text across Corpora
Flor Miriam Plaza-del-Arco | María-Teresa Martín-Valdivia | Roman Klinger
Proceedings of the 29th International Conference on Computational Linguistics

Within textual emotion classification, the set of relevant labels depends on the domain and application scenario and might not be known at the time of model development. This conflicts with the classical paradigm of supervised learning in which the labels need to be predefined. A solution to obtain a model with a flexible set of labels is to use the paradigm of zero-shot learning as a natural language inference task, which in addition adds the advantage of not needing any labeled training data. This raises the question how to prompt a natural language inference model for zero-shot learning emotion classification. Options for prompt formulations include the emotion name anger alone or the statement “This text expresses anger”. With this paper, we analyze how sensitive a natural language inference-based zero-shot-learning classifier is to such changes to the prompt under consideration of the corpus: How carefully does the prompt need to be selected? We perform experiments on an established set of emotion datasets presenting different language registers according to different sources (tweets, events, blogs) with three natural language inference models and show that indeed the choice of a particular prompt formulation needs to fit to the corpus. We show that this challenge can be tackled with combinations of multiple prompts. Such ensemble is more robust across corpora than individual prompts and shows nearly the same performance as the individual best prompt for a particular corpus.

2021

pdf
OffendES: A New Corpus in Spanish for Offensive Language Research
Flor Miriam Plaza-del-Arco | Arturo Montejo-Ráez | L. Alfonso Ureña-López | María-Teresa Martín-Valdivia
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Offensive language detection and analysis has become a major area of research in Natural Language Processing. The freedom of participation in social media has exposed online users to posts designed to denigrate, insult or hurt them according to gender, race, religion, ideology, or other personal characteristics. Focusing on young influencers from the well-known social platforms of Twitter, Instagram, and YouTube, we have collected a corpus composed of 47,128 Spanish comments manually labeled on offensive pre-defined categories. A subset of the corpus attaches a degree of confidence to each label, so both multi-class classification and multi-output regression studies are possible. In this paper, we introduce the corpus, discuss its building process, novelties, and some preliminary experiments with it to serve as a baseline for the research community.

pdf
SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection
Flor Miriam Plaza-del-Arco | Pilar López-Úbeda | L. Alfonso Ureña-López | M. Teresa Martín-Valdivia
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes the participation of SINAI team at Task 5: Toxic Spans Detection which consists of identifying spans that make a text toxic. Although several resources and systems have been developed so far in the context of offensive language, both annotation and tasks have mainly focused on classifying whether a text is offensive or not. However, detecting toxic spans is crucial to identify why a text is toxic and can assist human moderators to locate this type of content on social media. In order to accomplish the task, we follow a deep learning-based approach using a Bidirectional variant of a Long Short Term Memory network along with a stacked Conditional Random Field decoding layer (BiLSTM-CRF). Specifically, we test the performance of the combination of different pre-trained word embeddings for recognizing toxic entities in text. The results show that the combination of word embeddings helps in detecting offensive content. Our team ranks 29th out of 91 participants.

2020

pdf
SINAI at SemEval-2020 Task 12: Offensive Language Identification Exploring Transfer Learning Models
Flor Miriam Plaza del Arco | M. Dolores Molina González | Alfonso Ureña-López | Maite Martin
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the participation of SINAI team at Task 12: OffensEval 2: Multilingual Offensive Language Identification in Social Media. In particular, the participation in Sub-task A in English which consists of identifying tweets as offensive or not offensive. We preprocess the dataset according to the language characteristics used on social media. Then, we select a small set from the training set provided by the organizers and fine-tune different Transformerbased models in order to test their effectiveness. Our team ranks 20th out of 85 participants in Subtask-A using the XLNet model.

pdf
EmoEvent: A Multilingual Emotion Corpus based on different Events
Flor Miriam Plaza del Arco | Carlo Strapparava | L. Alfonso Urena Lopez | Maite Martin
Proceedings of the Twelfth Language Resources and Evaluation Conference

In recent years emotion detection in text has become more popular due to its potential applications in fields such as psychology, marketing, political science, and artificial intelligence, among others. While opinion mining is a well-established task with many standard data sets and well-defined methodologies, emotion mining has received less attention due to its complexity. In particular, the annotated gold standard resources available are not enough. In order to address this shortage, we present a multilingual emotion data set based on different events that took place in April 2019. We collected tweets from the Twitter platform. Then one of seven emotions, six Ekman’s basic emotions plus the “neutral or other emotions”, was labeled on each tweet by 3 Amazon MTurkers. A total of 8,409 in Spanish and 7,303 in English were labeled. In addition, each tweet was also labeled as offensive or no offensive. We report some linguistic statistics about the data set in order to observe the difference between English and Spanish speakers when they express emotions related to the same events. Moreover, in order to validate the effectiveness of the data set, we also propose a machine learning approach for automatically detecting emotions in tweets for both languages, English and Spanish.

2019

pdf
Detecting Anorexia in Spanish Tweets
Pilar López Úbeda | Flor Miriam Plaza del Arco | Manuel Carlos Díaz Galiano | L. Alfonso Urena Lopez | Maite Martin
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Mental health is one of the main concerns of today’s society. Early detection of symptoms can greatly help people with mental disorders. People are using social networks more and more to express emotions, sentiments and mental states. Thus, the treatment of this information using NLP technologies can be applied to the automatic detection of mental problems such as eating disorders. However, the first step to solving the problem should be to provide a corpus in order to evaluate our systems. In this paper, we specifically focus on detecting anorexia messages on Twitter. Firstly, we have generated a new corpus of tweets extracted from different accounts including anorexia and non-anorexia messages in Spanish. The corpus is called SAD: Spanish Anorexia Detection corpus. In order to validate the effectiveness of the SAD corpus, we also propose several machine learning approaches for automatically detecting anorexia symptoms in the corpus. The good results obtained show that the application of textual classification methods is a promising option for developing this kind of system demonstrating that these tools could be used by professionals to help in the early detection of mental problems.

pdf
SINAI at SemEval-2019 Task 3: Using affective features for emotion classification in textual conversations
Flor Miriam Plaza-del-Arco | M. Dolores Molina-González | Maite Martin | L. Alfonso Ureña-López
Proceedings of the 13th International Workshop on Semantic Evaluation

Detecting emotions in textual conversation is a challenging problem in absence of nonverbal cues typically associated with emotion, like fa- cial expression or voice modulations. How- ever, more and more users are using message platforms such as WhatsApp or Telegram. For this reason, it is important to develop systems capable of understanding human emotions in textual conversations. In this paper, we carried out different systems to analyze the emotions of textual dialogue from SemEval-2019 Task 3: EmoContext for English language. Our main contribution is the integration of emotional and sentimental features in the classification using the SVM algorithm.

pdf
SINAI at SemEval-2019 Task 5: Ensemble learning to detect hate speech against inmigrants and women in English and Spanish tweets
Flor Miriam Plaza-del-Arco | M. Dolores Molina-González | Maite Martin | L. Alfonso Ureña-López
Proceedings of the 13th International Workshop on Semantic Evaluation

Misogyny and xenophobia are some of the most important social problems. With the in- crease in the use of social media, this feeling ofhatred towards women and immigrants can be more easily expressed, therefore it can cause harmful effects on social media users. For this reason, it is important to develop systems ca- pable of detecting hateful comments automatically. In this paper, we describe our system to analyze the hate speech in English and Spanish tweets against Immigrants and Women as part of our participation in SemEval-2019 Task 5: hatEval. Our main contribution is the integration of three individual algorithms of predic- tion in a model based on Vote ensemble classifier.

pdf
SINAI at SemEval-2019 Task 6: Incorporating lexicon knowledge into SVM learning to identify and categorize offensive language in social media
Flor Miriam Plaza-del-Arco | M. Dolores Molina-González | Maite Martin | L. Alfonso Ureña-López
Proceedings of the 13th International Workshop on Semantic Evaluation

Offensive language has an impact across society. The use of social media has aggravated this issue among online users, causing suicides in the worst cases. For this reason, it is important to develop systems capable of identifying and detecting offensive language in text automatically. In this paper, we developed a system to classify offensive tweets as part of our participation in SemEval-2019 Task 6: OffensEval. Our main contribution is the integration of lexical features in the classification using the SVM algorithm.

2018

pdf
SINAI at SemEval-2018 Task 1: Emotion Recognition in Tweets
Flor Miriam Plaza-del-Arco | Salud María Jiménez-Zafra | Maite Martin | L. Alfonso Ureña-López
Proceedings of the 12th International Workshop on Semantic Evaluation

Emotion classification is a new task that combines several disciplines including Artificial Intelligence and Psychology, although Natural Language Processing is perhaps the most challenging area. In this paper, we describe our participation in SemEval-2018 Task1: Affect in Tweets. In particular, we have participated in EI-oc, EI-reg and E-c subtasks for English and Spanish languages.

pdf
SINAI at IEST 2018: Neural Encoding of Emotional External Knowledge for Emotion Classification
Flor Miriam Plaza-del-Arco | Eugenio Martínez-Cámara | Maite Martin | L. Alfonso Ureña- López
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

In this paper, we describe our participation in WASSA 2018 Implicit Emotion Shared Task (IEST 2018). We claim that the use of emotional external knowledge may enhance the performance and the capacity of generalization of an emotion classification system based on neural networks. Accordingly, we submitted four deep learning systems grounded in a sequence encoding layer. They mainly differ in the feature vector space and the recurrent neural network used in the sequence encoding layer. The official results show that the systems that used emotional external knowledge have a higher capacity of generalization, hence our claim holds.