Ådne Jøssing


2026

Tackling task 2, we fine tune a BERT-style encoder with classification heads added on top. We first try out different pre-trained encoder models, before settling on the Twhin-bert multilingual model, since its pretraining corpus (mainly tweets) provides a suitable starting point for our task. To resolve the issue of diverging label annotation styles, we apply the S-Cut algorithm, in order to calibrate thresholds for label selection, and examine its impact. We take a look at the resulting hidden representations in a reduced dimensional space, and examine the linguistic information encoded by our model after fine-tuning using linguistic probing.