Revisiting Representation Degeneration Problem in Language Modeling
Zhong Zhang, Chongming Gao, Cong Xu, Rui Miao, Qinli Yang, Junming Shao
Abstract
Weight tying is now a common setting in many language generation tasks such as language modeling and machine translation. However, a recent study reveals that there is a potential flaw in weight tying. They find that the learned word embeddings are likely to degenerate and lie in a narrow cone when training a language model. They call it the representation degeneration problem and propose a cosine regularization to solve it. Nevertheless, we prove that the cosine regularization is insufficient to solve the problem, as the degeneration is still likely to happen under certain conditions. In this paper, we revisit the representation degeneration problem and theoretically analyze the limitations of the previously proposed solution. Afterward, we propose an alternative regularization method called Laplacian regularization to tackle the problem. Experiments on language modeling demonstrate the effectiveness of the proposed Laplacian regularization.- Anthology ID:
- 2020.findings-emnlp.46
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Trevor Cohn, Yulan He, Yang Liu
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 518–527
- Language:
- URL:
- https://aclanthology.org/2020.findings-emnlp.46
- DOI:
- 10.18653/v1/2020.findings-emnlp.46
- Cite (ACL):
- Zhong Zhang, Chongming Gao, Cong Xu, Rui Miao, Qinli Yang, and Junming Shao. 2020. Revisiting Representation Degeneration Problem in Language Modeling. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 518–527, Online. Association for Computational Linguistics.
- Cite (Informal):
- Revisiting Representation Degeneration Problem in Language Modeling (Zhang et al., Findings 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2020.findings-emnlp.46.pdf
- Data
- WikiText-2