Arturo Montejo-Ráez

Also published as: Arturo Montejo Ráez

2024

pdf abs
Enhancing Lexical Complexity Prediction through Few-shot Learning with Gpt-3
Jenny Alexandra Ortiz-Zambrano | César Humberto Espín-Riofrío | Arturo Montejo-Ráez
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024

This paper describes an experiment to evaluate the ability of the GPT-3 language model to classify terms regarding their lexical complexity. This was achieved through the creation and evaluation of different versions of the model: text-Davinci-002 y text-Davinci-003 and prompts for few-shot learning to determine the complexity of the words. The results obtained on the CompLex dataset achieve a minimum average error of 0.0856. Although this is not better than the state of the art (which is 0.0609), it is a performing and promising approach to lexical complexity prediction without the need for model fine-tuning.

pdf abs
CONAN-MT-SP: A Spanish Corpus for Counternarrative Using GPT Models
María Estrella Vallecillo Rodríguez | Maria Victoria Cantero Romero | Isabel Cabrera De Castro | Arturo Montejo Ráez | María Teresa Martín Valdivia
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper describes the automated generation of CounterNarratives (CNs) for Hate Speech (HS) in Spanish using GPT-based models. Our primary objective is to evaluate the performance of these models in comparison to human capabilities. For this purpose, the English CONAN Multitarget corpus is taken as a starting point and we use the DeepL API to automatically translate into Spanish. Two GPT-based models, GPT-3 and GPT-4, are applied to the HS segment through a few-shot prompting strategy to generate a new CN. As a consequence of our research, we have created a high quality corpus in Spanish that includes the original HS-CN pairs translated into Spanish, in addition to the CNs generated automatically with the GPT models and that have been evaluated manually. The resulting CONAN-MT-SP corpus and its evaluation will be made available to the research community, representing the most extensive linguistic resource of CNs in Spanish to date. The results demonstrate that, although the effectiveness of GPT-4 outperforms GPT-3, both models can be used as systems to automatically generate CNs to combat the HS. Moreover, these models consistently outperform human performance in most instances.

pdf abs
MentalRiskES: A New Corpus for Early Detection of Mental Disorders in Spanish
Alba M. Mármol Romero | Adrián Moreno Muñoz | Flor Miriam Plaza-del-Arco | M. Dolores Molina González | María Teresa Martín Valdivia | L. Alfonso Ureña-López | Arturo Montejo Ráez
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

With mental health issues on the rise on the Web, especially among young people, there is a growing need for effective identification and intervention. In this paper, we introduce a new open-sourced corpus for the early detection of mental disorders in Spanish, focusing on eating disorders, depression, and anxiety. It consists of user messages posted on groups within the Telegram message platform and contains over 1,300 subjects with more than 45,000 messages posted in different public Telegram groups. This corpus has been manually annotated via crowdsourcing and is prepared for its use in several Natural Language Processing tasks including text classification and regression tasks. The samples in the corpus include both text and time data. To provide a benchmark for future research, we conduct experiments on text classification and regression by using state-of-the-art transformer-based models.

pdf abs
Environmental Impact Measurement in the MentalRiskES Evaluation Campaign
Alba M. Mármol Romero | Adrián Moreno-Muñoz | Flor Miriam Plaza-del-Arco | M. Dolores Molina González | Arturo Montejo-Ráez
Proceedings of the Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability @ LREC-COLING 2024

With the rise of Large Language Models (LLMs), the NLP community is increasingly aware of the environmental consequences of model development due to the energy consumed for training and running these models. This study investigates the energy consumption and environmental impact of systems participating in the MentalRiskES shared task, at the Iberian Language Evaluation Forum (IberLEF) in the year 2023, which focuses on early risk identification of mental disorders in Spanish comments. Participants were asked to submit, for each prediction, a set of efficiency metrics, being carbon dioxide emissions among them. We conduct an empirical analysis of the data submitted considering model architecture, task complexity, and dataset characteristics, covering a spectrum from traditional Machine Learning (ML) models to advanced LLMs. Our findings contribute to understanding the ecological footprint of NLP systems and advocate for prioritizing environmental impact assessment in shared tasks to foster sustainability across diverse model types and approaches, being evaluation campaigns an adequate framework for this kind of analysis.

2021

pdf abs
Complex words identification using word-level features for SemEval-2020 Task 1
Jenny A. Ortiz-Zambrano | Arturo Montejo-Ráez
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This article describes a system to predict the complexity of words for the Lexical Complexity Prediction (LCP) shared task hosted at SemEval 2021 (Task 1) with a new annotated English dataset with a Likert scale. Located in the Lexical Semantics track, the task consisted of predicting the complexity value of the words in context. A machine learning approach was carried out based on the frequency of the words and several characteristics added at word level. Over these features, a supervised random forest regression algorithm was trained. Several runs were performed with different values to observe the performance of the algorithm. For the evaluation, our best results reported a M.A.E score of 0.07347, M.S.E. of 0.00938, and R.M.S.E. of 0.096871. Our experiments showed that, with a greater number of characteristics, the precision of the classification increases.

pdf abs
CLexIS2: A New Corpus for Complex Word Identification Research in Computing Studies
Jenny A. Ortiz Zambrano | Arturo Montejo-Ráez
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Reading is a complex process not only because of the words or sections that are difficult for the reader to understand. Complex word identification (CWI) is the task of detecting in the content of documents the words that are difficult or complex to understand by the people of a certain group. Annotated corpora for English learners are widely available, while they are less common for the Spanish language. In this article, we present CLexIS², a new corpus in Spanish to contribute to the advancement of research in the area of Lexical Simplification, specifically in the identification and prediction of complex words in computing studies. Several metrics used to evaluate the complexity of texts in Spanish were applied, such as LC, LDI, ILFW, SSR, SCI, ASL, CS. Furthermore, as a baseline of the primer, two experiments have been performed to predict the complexity of words: one using a supervised learning approach and the other using an unsupervised solution based on the frequency of words on a general corpus.

pdf abs
OffendES: A New Corpus in Spanish for Offensive Language Research
Flor Miriam Plaza-del-Arco | Arturo Montejo-Ráez | L. Alfonso Ureña-López | María-Teresa Martín-Valdivia
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Offensive language detection and analysis has become a major area of research in Natural Language Processing. The freedom of participation in social media has exposed online users to posts designed to denigrate, insult or hurt them according to gender, race, religion, ideology, or other personal characteristics. Focusing on young influencers from the well-known social platforms of Twitter, Instagram, and YouTube, we have collected a corpus composed of 47,128 Spanish comments manually labeled on offensive pre-defined categories. A subset of the corpus attaches a degree of confidence to each label, so both multi-class classification and multi-output regression studies are possible. In this paper, we introduce the corpus, discuss its building process, novelties, and some preliminary experiments with it to serve as a baseline for the research community.

2019

pdf abs
SINAI-DL at SemEval-2019 Task 5: Recurrent networks and data augmentation by paraphrasing
Arturo Montejo-Ráez | Salud María Jiménez-Zafra | Miguel A. García-Cumbreras | Manuel Carlos Díaz-Galiano
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the participation of the SINAI-DL team at Task 5 in SemEval 2019, called HatEval. We have applied some classic neural network layers, like word embeddings and LSTM, to build a neural classifier for both proposed tasks. Due to the small amount of training data provided compared to what is expected for an adequate learning stage in deep architectures, we explore the use of paraphrasing tools as source for data augmentation. Our results show that this method is promising, as some improvement has been found over non-augmented training sets.

pdf abs
SINAI-DL at SemEval-2019 Task 7: Data Augmentation and Temporal Expressions
Miguel A. García-Cumbreras | Salud María Jiménez-Zafra | Arturo Montejo-Ráez | Manuel Carlos Díaz-Galiano | Estela Saquete
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the participation of the SINAI-DL team at RumourEval (Task 7 in SemEval 2019, subtask A: SDQC). SDQC addresses the challenge of rumour stance classification as an indirect way of identifying potential rumours. Given a tweet with several replies, our system classifies each reply into either supporting, denying, questioning or commenting on the underlying rumours. We have applied data augmentation, temporal expressions labelling and transfer learning with a four-layer neural classifier. We achieve an accuracy of 0.715 with the official run over reply tweets.

2017

pdf abs
SINAI at SemEval-2017 Task 4: User based classification
Salud María Jiménez-Zafra | Arturo Montejo-Ráez | Maite Martin | L. Alfonso Ureña-López
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This document describes our participation in SemEval-2017 Task 4: Sentiment Analysis in Twitter. We have only reported results for subtask B - English, determining the polarity towards a topic on a two point scale (positive or negative sentiment). Our main contribution is the integration of user information in the classification process. A SVM model is trained with Word2Vec vectors from user’s tweets extracted from his timeline. The obtained results show that user-specific classifiers trained on tweets from user timeline can introduce noise as they are error prone because they are classified by an imperfect system. This encourages us to explore further integration of user information for author-based Sentiment Analysis.