Efficient Sequence Learning with Group Recurrent Networks

Fei Gao, Lijun Wu, Li Zhao, Tao Qin, Xueqi Cheng, Tie-Yan Liu


Abstract
Recurrent neural networks have achieved state-of-the-art results in many artificial intelligence tasks, such as language modeling, neural machine translation, speech recognition and so on. One of the key factors to these successes is big models. However, training such big models usually takes days or even weeks of time even if using tens of GPU cards. In this paper, we propose an efficient architecture to improve the efficiency of such RNN model training, which adopts the group strategy for recurrent layers, while exploiting the representation rearrangement strategy between layers as well as time steps. To demonstrate the advantages of our models, we conduct experiments on several datasets and tasks. The results show that our architecture achieves comparable or better accuracy comparing with baselines, with a much smaller number of parameters and at a much lower computational cost.
Anthology ID:
N18-1073
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
799–808
Language:
URL:
https://aclanthology.org/N18-1073
DOI:
10.18653/v1/N18-1073
Bibkey:
Cite (ACL):
Fei Gao, Lijun Wu, Li Zhao, Tao Qin, Xueqi Cheng, and Tie-Yan Liu. 2018. Efficient Sequence Learning with Group Recurrent Networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 799–808, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Efficient Sequence Learning with Group Recurrent Networks (Gao et al., NAACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/N18-1073.pdf