Unsupervised Grammar Induction with Depth-bounded PCFG
Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz
Abstract
There has been recent interest in applying cognitively- or empirically-motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016). This work extends this depth-bounding approach to probabilistic context-free grammar induction (DB-PCFG), which has a smaller parameter space than hierarchical sequence models, and therefore more fully exploits the space reductions of depth-bounding. Results for this model on grammar acquisition from transcribed child-directed speech and newswire text exceed or are competitive with those of other models when evaluated on parse accuracy. Moreover, grammars acquired from this model demonstrate a consistent use of category labels, something which has not been demonstrated by other acquisition models.- Anthology ID:
- Q18-1016
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 6
- Month:
- Year:
- 2018
- Address:
- Cambridge, MA
- Editors:
- Lillian Lee, Mark Johnson, Kristina Toutanova, Brian Roark
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 211–224
- Language:
- URL:
- https://aclanthology.org/Q18-1016
- DOI:
- 10.1162/tacl_a_00016
- Cite (ACL):
- Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, and Lane Schwartz. 2018. Unsupervised Grammar Induction with Depth-bounded PCFG. Transactions of the Association for Computational Linguistics, 6:211–224.
- Cite (Informal):
- Unsupervised Grammar Induction with Depth-bounded PCFG (Jin et al., TACL 2018)
- PDF:
- https://preview.aclanthology.org/landing_page/Q18-1016.pdf
- Code
- lifengjin/db-pcfg
- Data
- Penn Treebank