NLP-CIMAT at SemEval-2026 Task 9: LLM-Based One-Shot and Cross-Lingual Data Augmentation for Polarization Detection

Miriam Calderon-Reyes, Fernando Sanchez-Vega, Adrian Pastor Lopez Monroy


Abstract
This paper describes our participation in SemEval 2026 Task 9: Multilingual Text Polarization. The task requires estimating polarization levels across languages, where linguistic variability and limited annotated data pose significant challenges. To address data scarcity, we propose a pipeline that combines cross-lingual translation, synthetic data augmentation via LLMs, and domain-specific pre-trained models. Our approach leverages the hypothesis that polarization signals can transfer across languages without substantial loss of semantic alignment, enabling effective data augmentation through translation. Notably, one-shot synthetic example generation emerges as a viable strategy for enriching training data in topic-specific scenarios. Experimental results demonstrate high stability and competitive performance, achieving a macro F1-score of 0.7869 for Spanish and 0.7939 for English on the test set, ranking 21th on the official English leaderboard, while our Spanish results are competitive with top-performing systems, corresponding to 7th place.
Anthology ID:
2026.semeval-1.362
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2886–2893
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.362/
DOI:
Bibkey:
Cite (ACL):
Miriam Calderon-Reyes, Fernando Sanchez-Vega, and Adrian Pastor Lopez Monroy. 2026. NLP-CIMAT at SemEval-2026 Task 9: LLM-Based One-Shot and Cross-Lingual Data Augmentation for Polarization Detection. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2886–2893, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
NLP-CIMAT at SemEval-2026 Task 9: LLM-Based One-Shot and Cross-Lingual Data Augmentation for Polarization Detection (Calderon-Reyes et al., SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.362.pdf
Supplementarymaterial:
 2026.semeval-1.362.SupplementaryMaterial.zip