Gradient Localization Improves Lifelong Pretraining of Language Models

Jared Fernandez; Yonatan Bisk; Emma Strubell

doi:10.18653/v1/2024.findings-emnlp.949

Gradient Localization Improves Lifelong Pretraining of Language Models

Jared Fernandez, Yonatan Bisk, Emma Strubell

Abstract

Large Language Models (LLMs) trained on web-scale text corpora have been shown to capture world knowledge in their parameters. However, the mechanism by which language models store different types of knowledge is poorly understood. In this work, we examine two types of knowledge relating to temporally sensitive entities and demonstrate that each type is localized to different sets of parameters within the LLMs. We hypothesize that the lack of consideration of the locality of knowledge in existing continual learning methods contributes to both: the failed uptake of new information, and catastrophic forgetting of previously learned information. We observe that sequences containing references to updated and newly mentioned entities exhibit larger gradient norms in a subset of layers. We demonstrate that targeting parameter updates to these relevant layers can improve the performance of continually pretraining on language containing temporal drift.

Anthology ID:: 2024.findings-emnlp.949
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16188–16195
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.949/
DOI:: 10.18653/v1/2024.findings-emnlp.949
Bibkey:
Cite (ACL):: Jared Fernandez, Yonatan Bisk, and Emma Strubell. 2024. Gradient Localization Improves Lifelong Pretraining of Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 16188–16195, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Gradient Localization Improves Lifelong Pretraining of Language Models (Fernandez et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.949.pdf

PDF Cite Search Fix data