Abstract
We introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism which is a key component of transformer. In contrast to the conventional dropout mechanism which randomly drops units or connections, DropHead drops entire attention heads during training to prevent the multi-head attention model from being dominated by a small portion of attention heads. It can help reduce the risk of overfitting and allow the models to better benefit from the multi-head attention. Given the interaction between multi-headedness and training dynamics, we further propose a novel dropout rate scheduler to adjust the dropout rate of DropHead throughout training, which results in a better regularization effect. Experimental results demonstrate that our proposed approach can improve transformer models by 0.9 BLEU score on WMT14 En-De translation task and around 1.0 accuracy for various text classification tasks.- Anthology ID:
- 2020.findings-emnlp.178
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Trevor Cohn, Yulan He, Yang Liu
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1971–1980
- Language:
- URL:
- https://aclanthology.org/2020.findings-emnlp.178
- DOI:
- 10.18653/v1/2020.findings-emnlp.178
- Cite (ACL):
- Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou, and Ke Xu. 2020. Scheduled DropHead: A Regularization Method for Transformer Models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1971–1980, Online. Association for Computational Linguistics.
- Cite (Informal):
- Scheduled DropHead: A Regularization Method for Transformer Models (Zhou et al., Findings 2020)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2020.findings-emnlp.178.pdf
- Data
- IMDb Movie Reviews, SNLI, Yahoo! Answers