Di Bao


2026

This paper introduces a system based on fine-tuned pretrained language models, which is constructed for SemEval 2026 Task 9: Multilingual Polarization Type Classification. The task aims to perform multi-label polarization classification on texts covering 22 languages, identifying five types of polarization: political, racial/ethnic, religious, gender/sexual, and others. The main challenges of the task lie in handling uneven data distribution across languages, extreme class imbalance, and the complexity of cross-lingual semantic understanding. To address these challenges, a training framework integrating hybrid augmentation and multi-strategy regularization is proposed. Based on XLM-RoBERTa-large, the framework combines feature-space Mixup augmentation, an asymmetric loss function, adversarial training, and exponential moving average. Multi-label decisions are made through dynamic threshold optimization. Experimental results show that the proposed method achieves a macro-F1 score of 0.48 on the validation set, effectively improving classification performance and generalization capability in multilingual and imbalanced scenarios.