Compound Probabilistic Context-Free Grammars for Grammar Induction

Yoon Kim, Chris Dyer, Alexander Rush


Abstract
We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context free grammar. In contrast to traditional formulations which learn a single stochastic grammar, our context-free rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions. Inference in this context-dependent grammar is performed by collapsed variational inference, in which an amortized variational posterior is placed on the continuous variable, and the latent trees are marginalized with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach compared to recent state-of-the-art methods for grammar induction from words with neural language models.
Anthology ID:
P19-1228
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2369–2385
Language:
URL:
https://aclanthology.org/P19-1228
DOI:
10.18653/v1/P19-1228
Bibkey:
Cite (ACL):
Yoon Kim, Chris Dyer, and Alexander Rush. 2019. Compound Probabilistic Context-Free Grammars for Grammar Induction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2369–2385, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Compound Probabilistic Context-Free Grammars for Grammar Induction (Kim et al., ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/P19-1228.pdf
Code
 harvardnlp/compound-pcfg +  additional community code
Data
PTB Diagnostic ECG DatabasePenn Treebank