Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction
Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz
Abstract
There have been several recent attempts to improve the accuracy of grammar induction systems by bounding the recursive complexity of the induction model. Modern depth-bounded grammar inducers have been shown to be more accurate than early unbounded PCFG inducers, but this technique has never been compared against unbounded induction within the same system, in part because most previous depth-bounding models are built around sequence models, the complexity of which grows exponentially with the maximum allowed depth. The present work instead applies depth bounds within a chart-based Bayesian PCFG inducer, where bounding can be switched on and off, and then samples trees with or without bounding. Results show that depth-bounding is indeed significantly effective in limiting the search space of the inducer and thereby increasing accuracy of resulting parsing model, independent of the contribution of modern Bayesian induction techniques. Moreover, parsing results on English, Chinese and German show that this bounded model is able to produce parse trees more accurately than or competitively with state-of-the-art constituency grammar induction models.- Anthology ID:
- D18-1292
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2721–2731
- Language:
- URL:
- https://aclanthology.org/D18-1292
- DOI:
- 10.18653/v1/D18-1292
- Cite (ACL):
- Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, and Lane Schwartz. 2018. Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2721–2731, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction (Jin et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/D18-1292.pdf
- Code
- lifengjin/dimi_emnlp18
- Data
- Penn Treebank