Scaling Laws for Fact Memorization of Large Language Models

Xingyu Lu; Xiaonan Li; Qinyuan Cheng; Kai Ding; Xuan-Jing Huang; Xipeng Qiu

doi:10.18653/v1/2024.findings-emnlp.658

Scaling Laws for Fact Memorization of Large Language Models

Xingyu Lu, Xiaonan Li, Qinyuan Cheng, Kai Ding, Xuanjing Huang, Xipeng Qiu

Abstract

Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM’s fact knowledge and LLMs’ behaviors of memorizing different types of facts. We find that LLMs’ fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs, respectively. Estimated by the built scaling law, memorizing the whole Wikidata’s facts requires training an LLM with 1000B non-embed parameters for 100 epochs, suggesting that using LLMs to memorize all public facts is almost implausible for a general pre-training setting. Meanwhile, we find that LLMs can generalize on unseen fact knowledge and its scaling law is similar to general pre-training. Additionally, we analyze the compatibility and preference of LLMs’ fact memorization. For compatibility, we find LLMs struggle with memorizing redundant facts in a unified way. Only when correlated facts have the same direction and structure, the LLM can compatibly memorize them. This shows the inefficiency of LLM memorization for redundant facts. For preference, the LLM pays more attention to memorizing more frequent and difficult facts, and the subsequent facts can overwrite prior facts’ memorization, which significantly hinders low-frequency facts memorization. Our findings reveal the capacity and characteristics of LLMs’ fact knowledge learning, which provide directions for LLMs’ fact knowledge augmentation.

Anthology ID:: 2024.findings-emnlp.658
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11263–11282
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.658
DOI:: 10.18653/v1/2024.findings-emnlp.658
Bibkey:
Cite (ACL):: Xingyu Lu, Xiaonan Li, Qinyuan Cheng, Kai Ding, Xuanjing Huang, and Xipeng Qiu. 2024. Scaling Laws for Fact Memorization of Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 11263–11282, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Scaling Laws for Fact Memorization of Large Language Models (Lu et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2024.findings-emnlp.658.pdf

PDF Search