Ali Gharaee


2025

pdf bib
Tuebingen at SemEval-2025 Task 10: Class Weighting, External Knowledge and Data Augmentation in BERT Models
Özlem Karabulut | Soudabeh Eslami | Ali Gharaee | Matthew Andrews
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

The spread of disinformation and propaganda in online news presents a significant challengeto information integrity. As part of the SemEval 2025 Task-10 on Multilingual Characterization and Extraction of Narratives from Online News, this study focuses on Subtask 1: Entity Framing, which involves assigning roles to named entities within news articles across multiple languages.We investigate techniques such as data augmentation, external knowledge, and class weighting to improve classification performance. Our findings indicate that class weighting was more effective than other approaches

2024

pdf bib
BabyLM Challenge: Experimenting with Self-Distillation and Reverse-Distillation for Language Model Pre-Training on Constrained Datasets
Aakarsh Nair | Alina Hancharova | Mayank Kumar | Ali Gharaee
The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning

Language models (LMs) exhibit significant data inefficiency compared to human learners. A child is able to master language while consuming less than 100 million words of input, while language models require orders of magnitude more tokens during training. Our submission to the BabyLM Challenge utilizes a combination of self-distillation and reverse-distillation to train a sequence of ensemble models with improved training characteristics on a fixed-size 10 million-word dataset. Self-distillation is used to generate an ensemble of models of a certain fixed size, while reverse distillation is used to train a more expressive larger model from a previously trained generation of relatively smaller models, while largely preserving learned accuracy.We find that ensembles consisting of two smaller models and one identical born-again model serve as ideal ensembles for each trained generation of model size. We demonstrate that, although our method is not novel, it provides consistent and modest performance improvements on the BLiMP and GLUE benchmarks.