Bado Völckers


2026

We present a multilingual polarization detection system for SemEval-2026 Task 9 (Subtask 1), covering 22 languages with transformer-based models. We evaluate four strategies: data rebalancing, hyperparameter optimization, model scaling, and ensembling, and show that undersampling harms performance, while larger pretrained models improve results substantially. Our best single model, XLM-RoBERTa Large, achieves a Macro-F1 of 0.7929, with analysis showing complementary strengths across model families (e.g., RemBERT for several Indic languages and mDeBERTa for Semitic/morphologically rich languages). Ensemble gains are marginal, suggesting language-aware routing is more promising than uniform aggregation. We also provide a privacy-preserving Firefox extension that runs local ONNX inference for practical deployment without sending user text to external servers.