Phatthachdau at SemEval-2026 Task 9: A Multi-Stage Augment-Judge-Train Pipeline for Multilingual Online Polarization Detection

Phan Phat


Abstract
Address the extreme label imbalance in the Hausa dataset where only 11% of instances are polarized—through the Augment-Judge-Train (AJT) pipeline. By leveraging Gemini 2.0 for taxonomy-driven data generation and an LLM-as-a-Judge layer for quality control, we expanded the minority class sixfold. Our ensemble architecture, combining specialized Encoders with LLM-LORA, achieved 1st Place in Hausa (0.8336 Macro-F1) and ranked in the Top 10 for English. These results demonstrate the efficacy of culture-aware synthetic data in enhancing social NLP for low-resource languages.
Anthology ID:
2026.semeval-1.208
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1616–1620
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.208/
DOI:
Bibkey:
Cite (ACL):
Phan Phat. 2026. Phatthachdau at SemEval-2026 Task 9: A Multi-Stage Augment-Judge-Train Pipeline for Multilingual Online Polarization Detection. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 1616–1620, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Phatthachdau at SemEval-2026 Task 9: A Multi-Stage Augment-Judge-Train Pipeline for Multilingual Online Polarization Detection (Phat, SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.208.pdf