R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling

Xiang Hu, Haitao Mi, Zujie Wen, Yafang Wang, Yi Su, Jing Zheng, Gerard de Melo


Abstract
Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. In this paper, we propose a recursive Transformer model based on differentiable CKY style binary trees to emulate this composition process, and we extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruning and growing algorithm to reduce the time complexity and enable encoding in linear time. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach.
Anthology ID:
2021.acl-long.379
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4897–4908
Language:
URL:
https://aclanthology.org/2021.acl-long.379
DOI:
10.18653/v1/2021.acl-long.379
Bibkey:
Cite (ACL):
Xiang Hu, Haitao Mi, Zujie Wen, Yafang Wang, Yi Su, Jing Zheng, and Gerard de Melo. 2021. R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4897–4908, Online. Association for Computational Linguistics.
Cite (Informal):
R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling (Hu et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2021.acl-long.379.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-2/2021.acl-long.379.mp4
Code
 alipay/StructuredLM_RTDT
Data
Penn TreebankWikiText-2