Ilinca Vandici


2026

Tackling task 2, we fine tune a BERT-style encoder with classification heads added on top. We first try out different pre-trained encoder models, before settling on the Twhin-bert multilingual model, since its pretraining corpus (mainly tweets) provides a suitable starting point for our task. To resolve the issue of diverging label annotation styles, we apply the S-Cut algorithm, in order to calibrate thresholds for label selection, and examine its impact. We take a look at the resulting hidden representations in a reduced dimensional space, and examine the linguistic information encoded by our model after fine-tuning using linguistic probing.

2024

Our team tackled the SemEval-2024 Task 6, focusing on identifying fluent over-generation hallucinations in NLP outputs. We proposed a pragmatic solution using a logistic regression classifier and a feed-forward ANN, harnessing SBERT embeddings for feature extraction.