Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

Yekun Chai; Shuo Jin; Xinwen Hou

doi:10.18653/v1/2020.acl-main.616

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

Abstract

Self-attention mechanisms have made striking state-of-the-art (SOTA) progress in various sequence learning tasks, standing on the multi-headed dot product attention by attending to all the global contexts at different locations. Through a pseudo information highway, we introduce a gated component self-dependency units (SDU) that incorporates LSTM-styled gating units to replenish internal semantic importance within the multi-dimensional latent space of individual representations. The subsidiary content-based SDU gates allow for the information flow of modulated latent embeddings through skipped connections, leading to a clear margin of convergence speed with gradient descent algorithms. We may unveil the role of gating mechanism to aid in the context-based Transformer modules, with hypothesizing that SDU gates, especially on shallow layers, could push it faster to step towards suboptimal points during the optimization process.

Anthology ID:: 2020.acl-main.616
Volume:: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2020
Address:: Online
Editors:: Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6887–6900
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.acl-main.616/
DOI:: 10.18653/v1/2020.acl-main.616
Bibkey:
Cite (ACL):: Yekun Chai, Shuo Jin, and Xinwen Hou. 2020. Highway Transformer: Self-Gating Enhanced Self-Attentive Networks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6887–6900, Online. Association for Computational Linguistics.
Cite (Informal):: Highway Transformer: Self-Gating Enhanced Self-Attentive Networks (Chai et al., ACL 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.acl-main.616.pdf
Video:: http://slideslive.com/38928904
Code: cyk1337/Highway-Transformer

PDF Cite Search Code Video Fix data