Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence

Shaozhen Shi; Yevgen Matusevych; Malvina Nissim

Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence

Shaozhen Shi, Yevgen Matusevych, Malvina Nissim

Abstract

This study presents our submission to the Strict-Small Track of the 2nd BabyLM Challenge. We use a teacher-student distillation setup with the BabyLLaMa model (Timiryasov and Tastet, 2023) as a backbone. To make the student’s learning process more focused, we replace the objective function with a reverse Kullback-Leibler divergence, known to cause mode-seeking (rather than mode-averaging) behaviour in computational learners. We further experiment with having a single teacher (instead of an ensemble of two teachers) and implement additional optimization strategies to improve the distillation process. Our experiments show that under reverse KL divergence, a single-teacher model often outperforms or matches multiple-teacher models across most tasks. Additionally, incorporating advanced optimization techniques further enhances model performance, demonstrating the effectiveness and robustness of our proposed approach. These findings support our idea that “choosy babies need one coach”.

Anthology ID:: 2024.conll-babylm.8
Volume:: The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning
Month:: November
Year:: 2024
Address:: Miami, FL, USA
Editors:: Michael Y. Hu, Aaron Mueller, Candace Ross, Adina Williams, Tal Linzen, Chengxu Zhuang, Leshem Choshen, Ryan Cotterell, Alex Warstadt, Ethan Gotlieb Wilcox
Venues:: CoNLL | BabyLM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 95–105
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.conll-babylm.8/
DOI:
Bibkey:
Cite (ACL):: Shaozhen Shi, Yevgen Matusevych, and Malvina Nissim. 2024. Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence. In The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning, pages 95–105, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):: Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence (Shi et al., CoNLL-BabyLM 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.conll-babylm.8.pdf

PDF Cite Search Fix data