Exploring smaller batch sizes for a high-performing BabyLM model architecture

Sharid Loáiciga, Eleni Fysikoudi, Asad B. Sayeed


Abstract
We explore the conditions under which the highest-performing entry to the BabyLM task in 2023, Every Layer Counts BERT or ELC-BERT, is best-performing given more constrained resources than the original run, with a particular focus on batch size. ELC-BERT’s relative success, as an instance of model engineering compared to more cognitively-motivated architectures, could be taken as evidence that the “lowest-hanging” fruit is to be found from non-linguistic machine learning approaches. We find that if we take away the advantage of training time from ELC-BERT, the advantage of the architecture mostly disappears, but some hyperparameter combinations nevertheless differentiate themselves in performance.
Anthology ID:
2025.babylm-main.12
Volume:
Proceedings of the First BabyLM Workshop
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Lucas Charpentier, Leshem Choshen, Ryan Cotterell, Mustafa Omer Gul, Michael Y. Hu, Jing Liu, Jaap Jumelet, Tal Linzen, Aaron Mueller, Candace Ross, Raj Sanjay Shah, Alex Warstadt, Ethan Gotlieb Wilcox, Adina Williams
Venue:
BabyLM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
155–159
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.babylm-main.12/
DOI:
Bibkey:
Cite (ACL):
Sharid Loáiciga, Eleni Fysikoudi, and Asad B. Sayeed. 2025. Exploring smaller batch sizes for a high-performing BabyLM model architecture. In Proceedings of the First BabyLM Workshop, pages 155–159, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Exploring smaller batch sizes for a high-performing BabyLM model architecture (Loáiciga et al., BabyLM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.babylm-main.12.pdf