RNNs can generate bounded hierarchical languages with optimal memory

John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning


Abstract
Recurrent neural networks empirically generate natural language with high syntactic fidelity. However, their success is not well-understood theoretically. We provide theoretical insight into this success, proving in a finite-precision setting that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax. We introduce Dyck-(k,m), the language of well-nested brackets (of k types) and m-bounded nesting depth, reflecting the bounded memory needs and long-distance dependencies of natural language syntax. The best known results use O(km2) memory (hidden units) to generate these languages. We prove that an RNN with O(m log k) hidden units suffices, an exponential reduction in memory, by an explicit construction. Finally, we show that no algorithm, even with unbounded computation, can suffice with o(m log k) hidden units.
Anthology ID:
2020.emnlp-main.156
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1978–2010
Language:
URL:
https://aclanthology.org/2020.emnlp-main.156
DOI:
10.18653/v1/2020.emnlp-main.156
Bibkey:
Cite (ACL):
John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, and Christopher D. Manning. 2020. RNNs can generate bounded hierarchical languages with optimal memory. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1978–2010, Online. Association for Computational Linguistics.
Cite (Informal):
RNNs can generate bounded hierarchical languages with optimal memory (Hewitt et al., EMNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2020.emnlp-main.156.pdf
Optional supplementary material:
 2020.emnlp-main.156.OptionalSupplementaryMaterial.tgz
Video:
 https://slideslive.com/38939362
Code
 john-hewitt/dyckkm-learning +  additional community code