Abstract
Grapheme-to-phoneme conversion (G2P) is a critical component of the text-to-speech system (TTS), where polyphone disambiguation is the most crucial task. However, polyphone disambiguation datasets often suffer from the long-tail problem, and context learning for polyphonic characters commonly stems from a single dimension. In this paper, we propose a novel model DLM: a Decoupled Learning Model for long-tailed polyphone disambiguation in Mandarin. Firstly, DLM decouples representation and classification learnings. It can apply different data samplers for each stage to obtain an optimal training data distribution. This can mitigate the long-tail problem. Secondly, two improved attention mechanisms and a gradual conversion strategy are integrated into the DLM, which achieve transition learning of context from local to global. Finally, to evaluate the effectiveness of DLM, we construct a balanced polyphone disambiguation corpus via in-context learning. Experiments on the benchmark CPP dataset demonstrate that DLM achieves a boosted accuracy of 99.07%. Moreover, DLM improves the disambiguation performance of long-tailed polyphonic characters. For many long-tailed characters, DLM even achieves an accuracy of 100%.- Anthology ID:
- 2024.naacl-long.294
- Volume:
- Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5252–5262
- Language:
- URL:
- https://aclanthology.org/2024.naacl-long.294
- DOI:
- 10.18653/v1/2024.naacl-long.294
- Cite (ACL):
- Beibei Gao, Yangsen Zhang, Ga Xiang, and Yushan Jiang. 2024. DLM: A Decoupled Learning Model for Long-tailed Polyphone Disambiguation in Mandarin. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5252–5262, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- DLM: A Decoupled Learning Model for Long-tailed Polyphone Disambiguation in Mandarin (Gao et al., NAACL 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.naacl-long.294.pdf