From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts

Daniel Christoph, Max Ploner, Patrick Haller, Alan Akbik


Abstract
Sample efficiency is a crucial property of language models with practical implications for training efficiency. In real-world text, information follows a long-tailed distribution. Yet, we expect models to learn and recall frequent and infrequent facts. Sample efficient models are better equipped to handle this challenge of learning and retaining rare information without requiring excessive exposure. This study analyzes multiple models of varying architectures and sizes, all trained on the same pre-training data. By annotating relational facts with their frequencies in the training corpus, we examine how model performance varies with fact frequency. Our findings show that most models perform similarly on high-frequency facts but differ notably on low-frequency facts. This analysis provides new insights into the relationship between model architecture, size, and factual learning efficiency.
Anthology ID:
2025.l2m2-1.3
Volume:
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Robin Jia, Eric Wallace, Yangsibo Huang, Tiago Pimentel, Pratyush Maini, Verna Dankers, Johnny Wei, Pietro Lesci
Venues:
L2M2 | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
29–46
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.l2m2-1.3/
DOI:
Bibkey:
Cite (ACL):
Daniel Christoph, Max Ploner, Patrick Haller, and Alan Akbik. 2025. From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts. In Proceedings of the First Workshop on Large Language Model Memorization (L2M2), pages 29–46, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts (Christoph et al., L2M2 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.l2m2-1.3.pdf