Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders

Andrew Drozdov, Patrick Verga, Mohit Yadav, Mohit Iyyer, Andrew McCallum


Abstract
We introduce the deep inside-outside recursive autoencoder (DIORA), a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree. Our approach predicts each word in an input sentence conditioned on the rest of the sentence. During training we use dynamic programming to consider all possible binary trees over the sentence, and for inference we use the CKY algorithm to extract the highest scoring parse. DIORA outperforms previously reported results for unsupervised binary constituency parsing on the benchmark WSJ dataset.
Anthology ID:
N19-1116
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1129–1141
Language:
URL:
https://aclanthology.org/N19-1116
DOI:
10.18653/v1/N19-1116
Bibkey:
Cite (ACL):
Andrew Drozdov, Patrick Verga, Mohit Yadav, Mohit Iyyer, and Andrew McCallum. 2019. Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1129–1141, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders (Drozdov et al., NAACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/N19-1116.pdf
Data
MultiNLIPTB Diagnostic ECG DatabasePenn Treebank