Abstract
State-of-the-art pre-trained language models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets. However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.- Anthology ID:
- 2022.acl-long.521
- Volume:
- Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7564–7578
- Language:
- URL:
- https://aclanthology.org/2022.acl-long.521
- DOI:
- 10.18653/v1/2022.acl-long.521
- Cite (ACL):
- Michael Tänzer, Sebastian Ruder, and Marek Rei. 2022. Memorisation versus Generalisation in Pre-trained Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7564–7578, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Memorisation versus Generalisation in Pre-trained Language Models (Tänzer et al., ACL 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.acl-long.521.pdf
- Code
- Michael-Tanzer/BERT-mem-lowres
- Data
- CIFAR-10, CoNLL++, CoNLL-2003, WNUT 2017