Abstract
Unsupervised PCFG induction models, which build syntactic structures from raw text, can be used to evaluate the extent to which syntactic knowledge can be acquired from distributional information alone. However, many state-of-the-art PCFG induction models are word-based, meaning that they cannot directly inspect functional affixes, which may provide crucial information for syntactic acquisition in child learners. This work first introduces a neural PCFG induction model that allows a clean ablation of the influence of subword information in grammar induction. Experiments on child-directed speech demonstrate first that the incorporation of subword information results in more accurate grammars with categories that word-based induction models have difficulty finding, and second that this effect is amplified in morphologically richer languages that rely on functional affixes to express grammatical relations. A subsequent evaluation on multilingual treebanks shows that the model with subword information achieves state-of-the-art results on many languages, further supporting a distributional model of syntactic acquisition.- Anthology ID:
- 2021.findings-emnlp.371
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4367–4378
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.371
- DOI:
- 10.18653/v1/2021.findings-emnlp.371
- Cite (ACL):
- Lifeng Jin, Byung-Doh Oh, and William Schuler. 2021. Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4367–4378, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages (Jin et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2021.findings-emnlp.371.pdf