PingAn-NLP at SemEval-2026 Task 9: Multi-Stage Alignment via GRPO and Tiered Ensemble Voting for Multilingual Polarization Detection

Diyang Chen, Youzhen Pang


Abstract
This submission describes the PingAn-NLP system for SemEval-2026 Task 9 Subtask 3, identifying polarization manifestations in 18 languages. We employ a tiered optimization framework integrating Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). Key technical innovations include synthetic reasoning distillation from a 235B teacher model , a Smart-Tradeoff reward function designed to mitigate extreme label imbalance , and a tiered ensemble voting strategy that adaptively adjusts decision thresholds based on language resources. Our 8B-GRPO-Vote system demonstrated robust internal performance in tracks like English and Hindi and officially secured second place in the Bengali, English, Odia, and Turkish competitions.
Anthology ID:
2026.semeval-1.354
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2810–2816
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.354/
DOI:
Bibkey:
Cite (ACL):
Diyang Chen and Youzhen Pang. 2026. PingAn-NLP at SemEval-2026 Task 9: Multi-Stage Alignment via GRPO and Tiered Ensemble Voting for Multilingual Polarization Detection. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2810–2816, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
PingAn-NLP at SemEval-2026 Task 9: Multi-Stage Alignment via GRPO and Tiered Ensemble Voting for Multilingual Polarization Detection (Chen & Pang, SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.354.pdf