Youzhen Pang

2026

PingAn-NLP at SemEval-2026 Task 9: Multi-Stage Alignment via GRPO and Tiered Ensemble Voting for Multilingual Polarization Detection
Diyang Chen | Youzhen Pang
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This submission describes the PingAn-NLP system for SemEval-2026 Task 9 Subtask 3, identifying polarization manifestations in 18 languages. We employ a tiered optimization framework integrating Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). Key technical innovations include synthetic reasoning distillation from a 235B teacher model , a Smart-Tradeoff reward function designed to mitigate extreme label imbalance , and a tiered ensemble voting strategy that adaptively adjusts decision thresholds based on language resources. Our 8B-GRPO-Vote system demonstrated robust internal performance in tracks like English and Hindi and officially secured second place in the Bengali, English, Odia, and Turkish competitions.

Co-authors

Diyang Chen 1

Venues

SemEval1
WS1

Fix author