Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

Inar Timiryasov, Jean-Loup Tastet

Anthology ID:: 2023.conll-babylm.24
Volume:: Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Alex Warstadt, Aaron Mueller, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjabe, Adina Williams, Tal Linzen, Ryan Cotterell
Venues:: CoNLL | BabyLM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 279–289
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2023.conll-babylm.24/
DOI:: 10.18653/v1/2023.conll-babylm.24
Bibkey:
Cite (ACL):: Inar Timiryasov and Jean-Loup Tastet. 2023. Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty. In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 279–289, Singapore. Association for Computational Linguistics.
Cite (Informal):: Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty (Timiryasov & Tastet, CoNLL-BabyLM 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2023.conll-babylm.24.pdf

PDF Cite Search Fix data