Learning Dynamics of Meta-Learning in Small Model Pretraining

David Demitri Africa, Yuval Weiss, Paula Buttery, Richard Diehl Martinez


Abstract
Large language models are powerful but costly. We ask whether meta-learning can make the pretraining of small language models not only faster but also more interpretable. We integrate first–order MAML with subset-masked LM pretraining, producing four LLama-style decoder-only models (11M–570M params), and evaluate on multilingual Universal NER. Compared with vanilla training, our hybrid setup (i) reaches the same loss up to 1.6× sooner, (ii) yields modest but consistent average gains on Universal NER at medium/large scales under equal compute (+2–3 percentage points), and (iii) and (iii) reveals phase-like learning dynamics: models first diversify their representations, then compress them in a pattern that aligns with improved episodic accuracy. These observations are correlational, not causal, and we do not claim generality beyond NER or across seeds. We also document a trade-off: perplexity on Paloma (a diverse language modeling benchmark spanning 18 domains) is worse at most scales. Code, checkpoints and analysis logs are released.
Anthology ID:
2025.ijcnlp-srw.2
Volume:
The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Santosh T.y.s.s, Shuichiro Shimizu, Yifan Gong
Venue:
IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10–23
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-srw.2/
DOI:
Bibkey:
Cite (ACL):
David Demitri Africa, Yuval Weiss, Paula Buttery, and Richard Diehl Martinez. 2025. Learning Dynamics of Meta-Learning in Small Model Pretraining. In The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 10–23, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Learning Dynamics of Meta-Learning in Small Model Pretraining (Africa et al., IJCNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-srw.2.pdf