Low-Resource G2P and P2G Conversion with Synthetic Training Data
Bradley Hauer, Amir Ahmad Habibi, Yixing Luan, Arnob Mallik, Grzegorz Kondrak
Abstract
This paper presents the University of Alberta systems and results in the SIGMORPHON 2020 Task 1: Multilingual Grapheme-to-Phoneme Conversion. Following previous SIGMORPHON shared tasks, we define a low-resource setting with 100 training instances. We experiment with three transduction approaches in both standard and low-resource settings, as well as on the related task of phoneme-to-grapheme conversion. We propose a method for synthesizing training data using a combination of diverse models.- Anthology ID:
- 2020.sigmorphon-1.12
- Volume:
- Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
- Month:
- July
- Year:
- 2020
- Address:
- Online
- Editors:
- Garrett Nicolai, Kyle Gorman, Ryan Cotterell
- Venue:
- SIGMORPHON
- SIG:
- SIGMORPHON
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 117–122
- Language:
- URL:
- https://aclanthology.org/2020.sigmorphon-1.12
- DOI:
- 10.18653/v1/2020.sigmorphon-1.12
- Cite (ACL):
- Bradley Hauer, Amir Ahmad Habibi, Yixing Luan, Arnob Mallik, and Grzegorz Kondrak. 2020. Low-Resource G2P and P2G Conversion with Synthetic Training Data. In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 117–122, Online. Association for Computational Linguistics.
- Cite (Informal):
- Low-Resource G2P and P2G Conversion with Synthetic Training Data (Hauer et al., SIGMORPHON 2020)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2020.sigmorphon-1.12.pdf