Unsupervised Grammar Induction with Depth-bounded PCFG

Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz


Abstract
There has been recent interest in applying cognitively- or empirically-motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016). This work extends this depth-bounding approach to probabilistic context-free grammar induction (DB-PCFG), which has a smaller parameter space than hierarchical sequence models, and therefore more fully exploits the space reductions of depth-bounding. Results for this model on grammar acquisition from transcribed child-directed speech and newswire text exceed or are competitive with those of other models when evaluated on parse accuracy. Moreover, grammars acquired from this model demonstrate a consistent use of category labels, something which has not been demonstrated by other acquisition models.
Anthology ID:
Q18-1016
Volume:
Transactions of the Association for Computational Linguistics, Volume 6
Month:
Year:
2018
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Kristina Toutanova, Brian Roark
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
211–224
Language:
URL:
https://aclanthology.org/Q18-1016
DOI:
10.1162/tacl_a_00016
Bibkey:
Cite (ACL):
Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, and Lane Schwartz. 2018. Unsupervised Grammar Induction with Depth-bounded PCFG. Transactions of the Association for Computational Linguistics, 6:211–224.
Cite (Informal):
Unsupervised Grammar Induction with Depth-bounded PCFG (Jin et al., TACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/Q18-1016.pdf
Video:
 https://preview.aclanthology.org/landing_page/Q18-1016.mp4
Code
 lifengjin/db-pcfg
Data
Penn Treebank