Eleni Fysikoudi

2025

pdf bib abs
Exploring smaller batch sizes for a high-performing BabyLM model architecture
Sharid Loáiciga | Eleni Fysikoudi | Asad B. Sayeed
Proceedings of the First BabyLM Workshop

We explore the conditions under which the highest-performing entry to the BabyLM task in 2023, Every Layer Counts BERT or ELC-BERT, is best-performing given more constrained resources than the original run, with a particular focus on batch size. ELC-BERT’s relative success, as an instance of model engineering compared to more cognitively-motivated architectures, could be taken as evidence that the “lowest-hanging” fruit is to be found from non-linguistic machine learning approaches. We find that if we take away the advantage of training time from ELC-BERT, the advantage of the architecture mostly disappears, but some hyperparameter combinations nevertheless differentiate themselves in performance.

pdf bib abs
Active Curriculum Language Modeling over a Hybrid Pre-training Method
Eleni Fysikoudi | Sharid Loáiciga | Asad B. Sayeed
Proceedings of the First BabyLM Workshop

We apply the Active Curriculum Language Modeling (ACLM) method to the constrained pretraining setting of the 2025 BabyLM Challenge, where models are limited by both data and compute budgets. Using GPT-BERT (Charpentier and Samuel, 2024) as the base architecture, we investigate the impact of surprisal-based example selection for constructing a training curriculum. In addition, we conduct a targeted hyperparameter search over tokenizer size and batch size. Our approach yields stable pretrained models that surpass the official baseline on multiple evaluation tasks, demonstrating ACLM’s potential for improving performance and generalization in low-resource pretraining scenarios.

Co-authors

Sharid Loáiciga 2
Asad Sayeed 2

Venues

babylm2

Fix author