Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

Boxi Cao; Qiaoyu Tang; Hongyu Lin; Shanshan Jiang; Bin Dong; Xianpei Han; Jiawei Chen; Tianshu Wang; Le Sun

Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

Boxi Cao, Qiaoyu Tang, Hongyu Lin, Shanshan Jiang, Bin Dong, Xianpei Han, Jiawei Chen, Tianshu Wang, Le Sun

Abstract

Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memorizing dynamic mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models without pre-training are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.

Anthology ID:: 2024.lrec-main.1222
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 14016–14036
Language:
URL:: https://aclanthology.org/2024.lrec-main.1222
DOI:
Bibkey:
Cite (ACL):: Boxi Cao, Qiaoyu Tang, Hongyu Lin, Shanshan Jiang, Bin Dong, Xianpei Han, Jiawei Chen, Tianshu Wang, and Le Sun. 2024. Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14016–14036, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models (Cao et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-2024-clasp/2024.lrec-main.1222.pdf

PDF Search