Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection

Xiang Yu, Ngoc Thang Vu, Jonas Kuhn


Abstract
We present an iterative data augmentation framework, which trains and searches for an optimal ensemble and simultaneously annotates new training data in a self-training style. We apply this framework on two SIGMORPHON 2020 shared tasks: grapheme-to-phoneme conversion and morphological inflection. With very simple base models in the ensemble, we rank the first and the fourth in these two tasks. We show in the analysis that our system works especially well on low-resource languages.
Anthology ID:
2020.sigmorphon-1.5
Volume:
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
July
Year:
2020
Address:
Online
Editors:
Garrett Nicolai, Kyle Gorman, Ryan Cotterell
Venue:
SIGMORPHON
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
70–78
Language:
URL:
https://aclanthology.org/2020.sigmorphon-1.5
DOI:
10.18653/v1/2020.sigmorphon-1.5
Bibkey:
Cite (ACL):
Xiang Yu, Ngoc Thang Vu, and Jonas Kuhn. 2020. Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection. In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 70–78, Online. Association for Computational Linguistics.
Cite (Informal):
Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection (Yu et al., SIGMORPHON 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2020.sigmorphon-1.5.pdf