Abstract
We demonstrate a program that learns to pronounce Chinese text in Mandarin, without a pronunciation dictionary. From non-parallel streams of Chinese characters and Chinese pinyin syllables, it establishes a many-to-many mapping between characters and pronunciations. Using unsupervised methods, the program effectively deciphers writing into speech. Its token-level character-to-syllable accuracy is 89%, which significantly exceeds the 22% accuracy of prior work.- Anthology ID:
- 2020.emnlp-main.458
- Volume:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5687–5693
- Language:
- URL:
- https://aclanthology.org/2020.emnlp-main.458
- DOI:
- 10.18653/v1/2020.emnlp-main.458
- Cite (ACL):
- Christopher Chu, Scot Fang, and Kevin Knight. 2020. Learning to Pronounce Chinese Without a Pronunciation Dictionary. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5687–5693, Online. Association for Computational Linguistics.
- Cite (Informal):
- Learning to Pronounce Chinese Without a Pronunciation Dictionary (Chu et al., EMNLP 2020)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2020.emnlp-main.458.pdf