Abstract
Autoregressive transformer (ART)-based grapheme-to-phoneme (G2P) models have been proposed for bi/multilingual text-to-speech systems. Although they have achieved great success, they suffer from high inference latency in real-time industrial applications, especially processing long sentence. In this paper, we propose a fast and high-performance bilingual G2P model. For fast and exact decoding, we used a non-autoregressive structured transformer-based architecture and data augmentation for predicting output length. Our model achieved better performance than that of the previous autoregressive model and about 2700% faster inference speed.- Anthology ID:
- 2022.naacl-industry.32
- Original:
- 2022.naacl-industry.32v1
- Version 2:
- 2022.naacl-industry.32v2
- Volume:
- Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track
- Month:
- July
- Year:
- 2022
- Address:
- Hybrid: Seattle, Washington + Online
- Editors:
- Anastassia Loukina, Rashmi Gangadharaiah, Bonan Min
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 289–296
- Language:
- URL:
- https://aclanthology.org/2022.naacl-industry.32
- DOI:
- 10.18653/v1/2022.naacl-industry.32
- Cite (ACL):
- Hwa-Yeon Kim, Jong-Hwan Kim, and Jae-Min Kim. 2022. Fast Bilingual Grapheme-To-Phoneme Conversion. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, pages 289–296, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
- Cite (Informal):
- Fast Bilingual Grapheme-To-Phoneme Conversion (Kim et al., NAACL 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2022.naacl-industry.32.pdf