Alejandro Mosquera


2022

pdf
Tackling Data Drift with Adversarial Validation: An Application for German Text Complexity Estimation
Alejandro Mosquera
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text

This paper describes the winning approach in the first automated German text complexity assessment shared task as part of KONVENS 2022. To solve this difficult problem, the evaluated system relies on an ensemble of regression models that successfully combines both traditional feature engineering and pre-trained resources. Moreover, the use of adversarial validation is proposed as a method for countering the data drift identified during the development phase, thus helping to select relevant models and features and avoid leaderboard overfitting. The best submission reached 0.43 mapped RMSE on the test set during the final phase of the competition.

pdf
Amsqr at SemEval-2022 Task 4: Towards AutoNLP via Meta-Learning and Adversarial Data Augmentation for PCL Detection
Alejandro Mosquera
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes the use of AutoNLP techniques applied to the detection of patronizing and condescending language (PCL) in a binary classification scenario. The proposed approach combines meta-learning, in order to identify the best performing combination of deep learning architectures, with the synthesis of adversarial training examples; thus boosting robustness and model generalization. A submission from this system was evaluated as part of the first sub-task of SemEval 2022 - Task 4 and achieved an F1 score of 0.57%, which is 16 percentage points higher than the RoBERTa baseline provided by the organizers.

2021

pdf
Alejandro Mosquera at SemEval-2021 Task 1: Exploring Sentence and Word Features for Lexical Complexity Prediction
Alejandro Mosquera
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper revisits feature engineering approaches for predicting the complexity level of English words in a particular context using regression techniques. Our best submission to the Lexical Complexity Prediction (LCP) shared task was ranked 3rd out of 48 systems for sub-task 1 and achieved Pearson correlation coefficients of 0.779 and 0.809 for single words and multi-word expressions respectively. The conclusion is that a combination of lexical, contextual and semantic features can still produce strong baselines when compared against human judgement.

2020

pdf
Amsqr at SemEval-2020 Task 12: Offensive Language Detection Using Neural Networks and Anti-adversarial Features
Alejandro Mosquera
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes a method and system to solve the problem of detecting offensive language in social media using anti-adversarial features. Our submission to the SemEval-2020 task 12 challenge was generated by an stacked ensemble of neural networks fine-tuned on the OLID dataset and additional external sources. For Task-A (English), text normalisation filters were applied at both graphical and lexical level. The normalisation step effectively mitigates not only the natural presence of lexical variants but also intentional attempts to bypass moderation by introducing out of vocabulary words. Our approach provides strong F1 scores for both 2020 (0.9134) and 2019 (0.8258) challenges.

2014

pdf bib
Mining Lexical Variants from Microblogs: An Unsupervised Multilingual Approach
Alejandro Mosquera | Paloma Moreda Pozo
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)

2013

pdf
Improving Web 2.0 Opinion Mining Systems Using Text Normalisation Techniques
Alejandro Mosquera | Paloma Moreda Pozo
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf
UMCC_DLSI-(SA): Using a ranking algorithm and informal features to solve Sentiment Analysis in Twitter
Yoan Gutiérrez | Andy González | Roger Pérez | José I. Abreu | Antonio Fernández Orquín | Alejandro Mosquera | Andrés Montoyo | Rafael Muñoz | Franc Camara
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2011

pdf
The Use of Metrics for Measuring Informality Levels in Web 2.0 Texts
Alejandro Mosquera | Paloma Moreda
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology