Unsupervised Recurrent Neural Network Grammars
Yoon Kim, Alexander Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis
Abstract
Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms.- Anthology ID:
- N19-1114
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1105–1117
- Language:
- URL:
- https://aclanthology.org/N19-1114
- DOI:
- 10.18653/v1/N19-1114
- Cite (ACL):
- Yoon Kim, Alexander Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, and Gábor Melis. 2019. Unsupervised Recurrent Neural Network Grammars. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1105–1117, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Recurrent Neural Network Grammars (Kim et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/N19-1114.pdf
- Code
- harvardnlp/urnng
- Data
- Billion Word Benchmark, Penn Treebank