Memorisation versus Generalisation in Pre-trained Language Models

Michael Tänzer; Sebastian Ruder; Marek Rei

doi:10.18653/v1/2022.acl-long.521

Memorisation versus Generalisation in Pre-trained Language Models

Michael Tänzer, Sebastian Ruder, Marek Rei

Abstract

State-of-the-art pre-trained language models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets. However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.

Anthology ID:: 2022.acl-long.521
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7564–7578
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.acl-long.521/
DOI:: 10.18653/v1/2022.acl-long.521
Bibkey:
Cite (ACL):: Michael Tänzer, Sebastian Ruder, and Marek Rei. 2022. Memorisation versus Generalisation in Pre-trained Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7564–7578, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Memorisation versus Generalisation in Pre-trained Language Models (Tänzer et al., ACL 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.acl-long.521.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.acl-long.521.mp4
Code: Michael-Tanzer/BERT-mem-lowres
Data: CIFAR-10, CoNLL 2003, CoNLL++, WNUT 2017

PDF Cite Search Code Video Fix data