Miriam Calderon-Reyes
2026
NLP-CIMAT at SemEval-2026 Task 9: LLM-Based One-Shot and Cross-Lingual Data Augmentation for Polarization Detection
Miriam Calderon-Reyes | Fernando Sanchez-Vega | Adrian Pastor Lopez Monroy
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Miriam Calderon-Reyes | Fernando Sanchez-Vega | Adrian Pastor Lopez Monroy
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes our participation in SemEval 2026 Task 9: Multilingual Text Polarization. The task requires estimating polarization levels across languages, where linguistic variability and limited annotated data pose significant challenges. To address data scarcity, we propose a pipeline that combines cross-lingual translation, synthetic data augmentation via LLMs, and domain-specific pre-trained models. Our approach leverages the hypothesis that polarization signals can transfer across languages without substantial loss of semantic alignment, enabling effective data augmentation through translation. Notably, one-shot synthetic example generation emerges as a viable strategy for enriching training data in topic-specific scenarios. Experimental results demonstrate high stability and competitive performance, achieving a macro F1-score of 0.7869 for Spanish and 0.7939 for English on the test set, ranking 21th on the official English leaderboard, while our Spanish results are competitive with top-performing systems, corresponding to 7th place.