Chenfang Li


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2020

pdf bib
Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech
Yin May Oo | Theeraphol Wattanavekin | Chenfang Li | Pasindu De Silva | Supheakmungkol Sarin | Knot Pipatsrisawat | Martin Jansche | Oddur Kjartansson | Alexander Gutkin
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper introduces an open-source crowd-sourced multi-speaker speech corpus along with the comprehensive set of finite-state transducer (FST) grammars for performing text normalization for the Burmese (Myanmar) language. We also introduce the open-source finite-state grammars for performing grapheme-to-phoneme (G2P) conversion for Burmese. These three components are necessary (but not sufficient) for building a high-quality text-to-speech (TTS) system for Burmese, a tonal Southeast Asian language from the Sino-Tibetan family which presents several linguistic challenges. We describe the corpus acquisition process and provide the details of our finite state-based approach to Burmese text normalization and G2P. Our experiments involve building a multi-speaker TTS system based on long short term memory (LSTM) recurrent neural network (RNN) models, which were previously shown to perform well for other languages in a low-resource setting. Our results indicate that the data and grammars that we are announcing are sufficient to build reasonably high-quality models comparable to other systems. We hope these resources will facilitate speech and language research on the Burmese language, which is considered by many to be low-resource due to the limited availability of free linguistic data.

2018

pdf bib
Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech
Jaka Aris Eko Wibawa | Supheakmungkol Sarin | Chenfang Li | Knot Pipatsrisawat | Keshan Sodimana | Oddur Kjartansson | Alexander Gutkin | Martin Jansche | Linne Ha
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)