2023
pdf
abs
UMUTeam and SINAI at SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis using Multilingual Large Language Models and Data Augmentation
José Antonio García-Díaz
|
Ronghao Pan
|
Salud María Jiménez Zafra
|
María-Teresa Martn-Valdivia
|
L. Alfonso Ureña-López
|
Rafael Valencia-García
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This work presents the participation of the UMUTeam and the SINAI research groups in the SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis. The goal of this task is to predict the intimacy of a set of tweets in 10 languages: English, Spanish, Italian, Portuguese, French, Chinese, Hindi, Arabic, Dutch and Korean, of which, the last 4 are not in the training data. Our approach to address this task is based on data augmentation and the use of three multilingual Large Language Models (multilingual BERT, XLM and mDeBERTA) by ensemble learning. Our team ranked 30th out of 45 participants. Our best results were achieved with two unseen languages: Korean (16th) and Hindi (19th).
pdf
abs
UMUTeam at SemEval-2023 Task 10: Fine-grained detection of sexism in English
Ronghao Pan
|
José Antonio García-Díaz
|
Salud María Jiménez Zafra
|
Rafael Valencia-García
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
In this manuscript, we describe the participation of UMUTeam in the Explainable Detection of Online Sexism shared task proposed at SemEval 2023. This task concerns the precise and explainable detection of sexist content on Gab and Reddit, i.e., developing detailed classifiers that not only identify what is sexist, but also explain why it is sexism. Our participation in the three EDOS subtasks is based on extending new unlabeled sexism data in the Masked Language Model task of a pre-trained model, such as RoBERTa-large to improve its generalization capacity and its performance on classification tasks. Once the model has been pre-trained with the new data, fine-tuning of this model is performed for different specific sexism classification tasks. Our system has achieved excellent results in this competitive task, reaching top 24 (84) in Task A, top 23 (69) in Task B, and top 13 (63) in Task C.
pdf
abs
UMUTeam at SemEval-2023 Task 3: Multilingual transformer-based model for detecting the Genre, the Framing, and the Persuasion Techniques in Online News
Ronghao Pan
|
José Antonio García-Díaz
|
Miguel Ángel Rodríguez-García
|
Rafael Valencia-García
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
In this manuscript, we describe the participation of the UMUTeam in SemEval-2023 Task 3, a shared task on detecting different aspects of news articles and other web documents, such as document category, framing dimensions, and persuasion technique in a multilingual setup. The task has been organized into three related subtasks, and we have been involved in the first two. Our approach is based on a fine-tuned multilingual transformer-based model that uses the dataset of all languages at once and a sentence transformer model to extract the most relevant chunk of a text for subtasks 1 and 2. The input data was truncated to 200 tokens with 50 overlaps using the sentence-transformer model to obtain the subset of text most related to the articles’ titles. Our system has performed good results in subtask 1 in most languages, and in some cases, such as French and German, we have archived first place in the official leader board. As for task 2, our system has also performed very well in all languages, ranking in all the top 10.
pdf
abs
Chick Adams at SemEval-2023 Task 5: Using RoBERTa and DeBERTa to Extract Post and Document-based Features for Clickbait Spoiling
Ronghao Pan
|
José Antonio García-Díaz
|
Franciso García-Sánchez
|
Rafael Valencia-García
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
In this manuscript, we describe the participation of the UMUTeam in SemEval-2023 Task 5, namely, Clickbait Spoiling, a shared task on identifying spoiler type (i.e., a phrase or a passage) and generating short texts that satisfy curiosity induced by a clickbait post, i.e. generating spoilers for the clickbait post. Our participation in Task 1 is based on fine-tuning pre-trained models, which consists in taking a pre-trained model and tuning it to fit the spoiler classification task. Our system has obtained excellent results in Task 1: we outperformed all proposed baselines, being within the Top 10 for most measures. Foremost, we reached Top 3 in F1 score in the passage spoiler ranking.
pdf
abs
UMUTeam at SemEval-2023 Task 11: Ensemble Learning applied to Binary Supervised Classifiers with disagreements
José Antonio García-Díaz
|
Ronghao Pan
|
Gema Alcaráz-Mármol
|
María José Marín-Pérez
|
Rafael Valencia-García
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper describes the participation of the UMUTeam in the Learning With Disagreements (Le-Wi-Di) shared task proposed at SemEval 2023, which objective is the development of supervised automatic classifiers that consider, during training, the agreements and disagreements among the annotators of the datasets. Specifically, this edition includes a multilingual dataset. Our proposal is grounded on the development of ensemble learning classifiers that combine the outputs of several Large Language Models. Our proposal ranked position 18 of a total of 30 participants. However, our proposal did not incorporate the information about the disagreements. In contrast, we compare the performance of building several classifiers for each dataset separately with a merged dataset.