ConcreteGPT: A Baby GPT-2 Based on Lexical Concreteness and Curriculum Learning

Luca Capone; Alessandro Bondielli; Alessandro Lenci

ConcreteGPT: A Baby GPT-2 Based on Lexical Concreteness and Curriculum Learning

Luca Capone, Alessandro Bondielli, Alessandro Lenci

Abstract

We present a model for the Strict-Small track of the BabyLM Challenge 2024 (Choshen et al. 2024). We introduce a Curriculum Learning approach for training a specialized version of GPT-2 (Radford et al. 2019), that we name ConcreteGPT. We utilize the norms from (Brysbaert et al. 2014) which provide concreteness ratings for 40,000 English lexical items based on human subjects. Using these norms, we assign a concreteness score to each sentence in the training dataset and develop two curriculum strategies that progressively introduce more complex and abstract language patterns in the training data. Compared to the baselines, our best model shows lower performance on zero-shot tasks but demonstrates superior performance in fine-tuning tasks. Notably, our curriculum-trained models exhibit significant improvements over a non-curriculum based training of the same model.

Anthology ID:: 2024.conll-babylm.16
Volume:: The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning
Month:: November
Year:: 2024
Address:: Miami, FL, USA
Editors:: Michael Y. Hu, Aaron Mueller, Candace Ross, Adina Williams, Tal Linzen, Chengxu Zhuang, Leshem Choshen, Ryan Cotterell, Alex Warstadt, Ethan Gotlieb Wilcox
Venues:: CoNLL | BabyLM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 189–196
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.conll-babylm.16/
DOI:
Bibkey:
Cite (ACL):: Luca Capone, Alessandro Bondielli, and Alessandro Lenci. 2024. ConcreteGPT: A Baby GPT-2 Based on Lexical Concreteness and Curriculum Learning. In The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning, pages 189–196, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):: ConcreteGPT: A Baby GPT-2 Based on Lexical Concreteness and Curriculum Learning (Capone et al., CoNLL-BabyLM 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.conll-babylm.16.pdf

PDF Cite Search Fix data