Youzhen Pang


2026

This submission describes the PingAn-NLP system for SemEval-2026 Task 9 Subtask 3, identifying polarization manifestations in 18 languages. We employ a tiered optimization framework integrating Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). Key technical innovations include synthetic reasoning distillation from a 235B teacher model , a Smart-Tradeoff reward function designed to mitigate extreme label imbalance , and a tiered ensemble voting strategy that adaptively adjusts decision thresholds based on language resources. Our 8B-GRPO-Vote system demonstrated robust internal performance in tracks like English and Hindi and officially secured second place in the Bengali, English, Odia, and Turkish competitions.