Learning Dynamics of Meta-Learning in Small Model Pretraining

David Demitri Africa; Yuval Weiss; Paula Buttery; Richard Diehl Martinez

Learning Dynamics of Meta-Learning in Small Model Pretraining

David Demitri Africa, Yuval Weiss, Paula Buttery, Richard Diehl Martinez

Abstract

Large language models are powerful but costly. We ask whether meta-learning can make the pretraining of small language models not only faster but also more interpretable. We integrate first–order MAML with subset-masked LM pretraining, producing four LLama-style decoder-only models (11M–570M params), and evaluate on multilingual Universal NER. Compared with vanilla training, our hybrid setup (i) reaches the same loss up to 1.6× sooner, (ii) yields modest but consistent average gains on Universal NER at medium/large scales under equal compute (+2–3 percentage points), and (iii) and (iii) reveals phase-like learning dynamics: models first diversify their representations, then compress them in a pattern that aligns with improved episodic accuracy. These observations are correlational, not causal, and we do not claim generality beyond NER or across seeds. We also document a trade-off: perplexity on Paloma (a diverse language modeling benchmark spanning 18 domains) is worse at most scales. Code, checkpoints and analysis logs are released.

Anthology ID:: 2025.ijcnlp-srw.2
Volume:: The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Santosh T.y.s.s, Shuichiro Shimizu, Yifan Gong
Venue:: IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10–23
Language:
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-srw.2/
DOI:
Bibkey:
Cite (ACL):: David Demitri Africa, Yuval Weiss, Paula Buttery, and Richard Diehl Martinez. 2025. Learning Dynamics of Meta-Learning in Small Model Pretraining. In The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 10–23, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):: Learning Dynamics of Meta-Learning in Small Model Pretraining (Africa et al., IJCNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-srw.2.pdf

PDF Cite Search Fix data