Scheduled DropHead: A Regularization Method for Transformer Models

Wangchunshu Zhou; Tao Ge; Furu Wei; Ming Zhou; Ke Xu

doi:10.18653/v1/2020.findings-emnlp.178

Scheduled DropHead: A Regularization Method for Transformer Models

Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou, Ke Xu

Abstract

We introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism which is a key component of transformer. In contrast to the conventional dropout mechanism which randomly drops units or connections, DropHead drops entire attention heads during training to prevent the multi-head attention model from being dominated by a small portion of attention heads. It can help reduce the risk of overfitting and allow the models to better benefit from the multi-head attention. Given the interaction between multi-headedness and training dynamics, we further propose a novel dropout rate scheduler to adjust the dropout rate of DropHead throughout training, which results in a better regularization effect. Experimental results demonstrate that our proposed approach can improve transformer models by 0.9 BLEU score on WMT14 En-De translation task and around 1.0 accuracy for various text classification tasks.

Anthology ID:: 2020.findings-emnlp.178
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2020
Month:: November
Year:: 2020
Address:: Online
Editors:: Trevor Cohn, Yulan He, Yang Liu
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1971–1980
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2020.findings-emnlp.178/
DOI:: 10.18653/v1/2020.findings-emnlp.178
Bibkey:
Cite (ACL):: Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou, and Ke Xu. 2020. Scheduled DropHead: A Regularization Method for Transformer Models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1971–1980, Online. Association for Computational Linguistics.
Cite (Informal):: Scheduled DropHead: A Regularization Method for Transformer Models (Zhou et al., Findings 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2020.findings-emnlp.178.pdf
Data: IMDb Movie Reviews, SNLI, Yahoo! Answers

PDF Cite Search Fix data