Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages
Pierre Godard, Laurent Besacier, François Yvon, Martine Adda-Decker, Gilles Adda, Hélène Maynard, Annie Rialland
Abstract
Computational Language Documentation attempts to make the most recent research in speech and language technologies available to linguists working on language preservation and documentation. In this paper, we pursue two main goals along these lines. The first is to improve upon a strong baseline for the unsupervised word discovery task on two very low-resource Bantu languages, taking advantage of the expertise of linguists on these particular languages. The second consists in exploring the Adaptor Grammar framework as a decision and prediction tool for linguists studying a new language. We experiment 162 grammar configurations for each language and show that using Adaptor Grammars for word segmentation enables us to test hypotheses about a language. Specializing a generic grammar with language specific knowledge leads to great improvements for the word discovery task, ultimately achieving a leap of about 30% token F-score from the results of a strong baseline.- Anthology ID:
- W18-5804
- Volume:
- Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology
- Month:
- October
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Sandra Kuebler, Garrett Nicolai
- Venue:
- EMNLP
- SIG:
- SIGMORPHON
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 32–42
- Language:
- URL:
- https://aclanthology.org/W18-5804
- DOI:
- 10.18653/v1/W18-5804
- Cite (ACL):
- Pierre Godard, Laurent Besacier, François Yvon, Martine Adda-Decker, Gilles Adda, Hélène Maynard, and Annie Rialland. 2018. Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages. In Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 32–42, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages (Godard et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/W18-5804.pdf