Low-Resource G2P and P2G Conversion with Synthetic Training Data

Bradley Hauer, Amir Ahmad Habibi, Yixing Luan, Arnob Mallik, Grzegorz Kondrak


Abstract
This paper presents the University of Alberta systems and results in the SIGMORPHON 2020 Task 1: Multilingual Grapheme-to-Phoneme Conversion. Following previous SIGMORPHON shared tasks, we define a low-resource setting with 100 training instances. We experiment with three transduction approaches in both standard and low-resource settings, as well as on the related task of phoneme-to-grapheme conversion. We propose a method for synthesizing training data using a combination of diverse models.
Anthology ID:
2020.sigmorphon-1.12
Volume:
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
July
Year:
2020
Address:
Online
Editors:
Garrett Nicolai, Kyle Gorman, Ryan Cotterell
Venue:
SIGMORPHON
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
117–122
Language:
URL:
https://aclanthology.org/2020.sigmorphon-1.12
DOI:
10.18653/v1/2020.sigmorphon-1.12
Bibkey:
Cite (ACL):
Bradley Hauer, Amir Ahmad Habibi, Yixing Luan, Arnob Mallik, and Grzegorz Kondrak. 2020. Low-Resource G2P and P2G Conversion with Synthetic Training Data. In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 117–122, Online. Association for Computational Linguistics.
Cite (Informal):
Low-Resource G2P and P2G Conversion with Synthetic Training Data (Hauer et al., SIGMORPHON 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/2020.sigmorphon-1.12.pdf