Abstract
This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar, then use a small labelled data set to select which potential morph boundaries identified by the metagrammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.- Anthology ID:
- Q13-1021
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 1
- Month:
- Year:
- 2013
- Address:
- Cambridge, MA
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 255–266
- Language:
- URL:
- https://aclanthology.org/Q13-1021
- DOI:
- 10.1162/tacl_a_00225
- Cite (ACL):
- Kairit Sirts and Sharon Goldwater. 2013. Minimally-Supervised Morphological Segmentation using Adaptor Grammars. Transactions of the Association for Computational Linguistics, 1:255–266.
- Cite (Informal):
- Minimally-Supervised Morphological Segmentation using Adaptor Grammars (Sirts & Goldwater, TACL 2013)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/Q13-1021.pdf