Data Augmentation for Transformer-based G2P

Zach Ryan; Mans Hulden

doi:10.18653/v1/2020.sigmorphon-1.21

Data Augmentation for Transformer-based G2P

Abstract

The Transformer model has been shown to outperform other neural seq2seq models in several character-level tasks. It is unclear, however, if the Transformer would benefit as much as other seq2seq models from data augmentation strategies in the low-resource setting. In this paper we explore strategies for data augmentation in the g2p task together with the Transformer model. Our results show that a relatively simple alignment-based strategy of identifying consistent input-output subsequences in grapheme-phoneme data coupled together with a subsequent splicing together of such pieces to generate hallucinated data works well in the low-resource setting, often delivering substantial performance improvement over a standard Transformer model.

Anthology ID:: 2020.sigmorphon-1.21
Volume:: Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:: July
Year:: 2020
Address:: Online
Editors:: Garrett Nicolai, Kyle Gorman, Ryan Cotterell
Venue:: SIGMORPHON
SIG:: SIGMORPHON
Publisher:: Association for Computational Linguistics
Note:
Pages:: 184–188
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.sigmorphon-1.21/
DOI:: 10.18653/v1/2020.sigmorphon-1.21
Bibkey:
Cite (ACL):: Zach Ryan and Mans Hulden. 2020. Data Augmentation for Transformer-based G2P. In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 184–188, Online. Association for Computational Linguistics.
Cite (Informal):: Data Augmentation for Transformer-based G2P (Ryan & Hulden, SIGMORPHON 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.sigmorphon-1.21.pdf

PDF Cite Search Fix data