Regularized Training of Nearest Neighbor Language Models
Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, Joshua M. Susskind
Abstract
Including memory banks in a natural language processing architecture increases model capacity by equipping it with additional data at inference time. In this paper, we build upon kNN-LM (CITATION), which uses a pre-trained language model together with an exhaustive kNN search through the training data (memory bank) to achieve state-of-the-art results. We investigate whether we can improve the kNN-LM performance by instead training a LM with the knowledge that we will be using a kNN post-hoc. We achieved significant improvement using our method on language modeling tasks on WIKI-2 and WIKI-103. The main phenomenon that we encounter is that adding a simple L2 regularization on the activations (not weights) of the model, a transformer, improves the post-hoc kNN classification performance. We explore some possible reasons for this improvement. In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.- Anthology ID:
- 2022.naacl-srw.4
- Volume:
- Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
- Month:
- July
- Year:
- 2022
- Address:
- Hybrid: Seattle, Washington + Online
- Editors:
- Daphne Ippolito, Liunian Harold Li, Maria Leonor Pacheco, Danqi Chen, Nianwen Xue
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 25–30
- Language:
- URL:
- https://aclanthology.org/2022.naacl-srw.4
- DOI:
- 10.18653/v1/2022.naacl-srw.4
- Cite (ACL):
- Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, and Joshua M. Susskind. 2022. Regularized Training of Nearest Neighbor Language Models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 25–30, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
- Cite (Informal):
- Regularized Training of Nearest Neighbor Language Models (Ton et al., NAACL 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.naacl-srw.4.pdf