Ethan Gotlieb Wilcox

Also published as: Ethan Gotlieb Wilcox


2025

pdf bib
Proceedings of the First BabyLM Workshop
Lucas Charpentier | Leshem Choshen | Ryan Cotterell | Mustafa Omer Gul | Michael Y. Hu | Jing Liu | Jaap Jumelet | Tal Linzen | Aaron Mueller | Candace Ross | Raj Sanjay Shah | Alex Warstadt | Ethan Gotlieb Wilcox | Adina Williams
Proceedings of the First BabyLM Workshop

pdf bib
Findings of the Third BabyLM Challenge: Accelerating Language Modeling Research with Cognitively Plausible Data
Lucas Charpentier | Leshem Choshen | Ryan Cotterell | Mustafa Omer Gul | Michael Y. Hu | Jing Liu | Jaap Jumelet | Tal Linzen | Aaron Mueller | Candance Ross | Raj Sanjay Shah | Alex Warstadt | Ethan Gotlieb Wilcox | Adina Williams
Proceedings of the First BabyLM Workshop

This report summarizes the findings from the 3rd BabyLM Challenge and the 1st BabyLM Workshop. The BabyLM Challenge is a shared task aimed at closing the data efficiency gap between human and machine language learners. The goal is to improve the performance of language models given a fixed training budget of no more than 100 million words. This year, the challenge was held as part of an expanded BabyLM Workshop that invited paper submissions on topics relevant to the BabyLM effort, including sample-efficient pretraining and cognitive modeling for LMs. For the challenge, we kept the text-only and text–image tracks from previous years, but also introduced a new interaction track, where student models are allowed to learn from feedback from larger teacher models. Furthermore, we introduce a new set of evaluation tasks to assess the “human likeness” of models on a cognitive and linguistic level, limit the total amount of training compute allowed, and measure performance on intermediate checkpoints. We observe that new training objectives and architectures tend to produce the best-performing approaches, and that interaction with teacher models can yield high-quality language models. The strict and interaction tracks saw submissions that outperformed the best-performing methods from previous years. We do not observe a complete correlation between training FLOPs and performance. %, suggesting that some methods can produce real gains beyond allowing us to spend more compute. This year’s BabyLM Challenge shows that there is still room to innovate in a data-constrained setting, and that community-driven research can yield actionable insights for language modeling.

2024

pdf bib
The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning
Michael Y. Hu | Aaron Mueller | Candace Ross | Adina Williams | Tal Linzen | Chengxu Zhuang | Leshem Choshen | Ryan Cotterell | Alex Warstadt | Ethan Gotlieb Wilcox
The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning

pdf bib
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu | Aaron Mueller | Candace Ross | Adina Williams | Tal Linzen | Chengxu Zhuang | Ryan Cotterell | Leshem Choshen | Alex Warstadt | Ethan Gotlieb Wilcox
The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning

The BabyLM Challenge is a community effort to close the data-efficiency gap between human and computational language learners. Participants compete to optimize language model training on a fixed language data budget of 100 million words or less. This year, we released improved text corpora, as well as a vision-and-language corpus to facilitate research into cognitively plausible vision language models. Submissions were compared on evaluation tasks targeting grammatical ability, (visual) question answering, pragmatic abilities, and grounding, among other abilities. Participants could submit to a 10M-word text-only track, a 100M-word text-only track, and/or a 100M-word and image multimodal track. From 31 submissions employing diverse methods, a hybrid causal-masked language model architecture outperformed other approaches. No submissions outperformed the baselines in the multimodal track. In follow-up analyses, we found a strong relationship between training FLOPs and average performance across tasks, and that the best-performing submissions proposed changes to the training data, training objective, and model architecture. This year’s BabyLM Challenge shows that there is still significant room for innovation in this setting, in particular for image-text modeling, but community-driven research can yield actionable insights about effective strategies for small-scale language modeling.

2023

pdf bib
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words
Lukas Wolf | Klemen Kotar | Greta Tuckute | Eghbal Hosseini | Tamar I. Regev | Ethan Gotlieb Wilcox | Alexander Scott Warstadt
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning